# training Colab notebook for **[RHVoice](https://github.com/RHVoice/RHVoice/)**

Colab notebook made by [rmcpantoja](http://github.com/rmcpantoja)

---

### Warning

This notebook is based on the RHVoice wiki about [creating a new voice for RHVoice](https://github.com/RHVoice/RHVoice/wiki/Creating-a-new-voice-for-RHVoice.) (by Grzezlo), in order to facilitate the work and training process in the cloud.

This is a summary of that tutorial, but now we can put it into practice here and train our RHVoice models online and in an easier way. Also, remember to run the cells in order for proper use of the notebook. The sections of this notebook are the same as the tutorial, and therefore you can go to the index of this notebook in the left panel to see them.

---

last update: 2022-12-10

# 1. Installation of common tools

In [None]:
#@markdown ## install dependencies, praat, speech tools and festival
print("Running first steps...")
!mkdir /content/tts
%cd /content/tts
!sudo apt-get install libspeechd-dev
# optional but important:
!sudo apt-get install gnulib libx11-dev libncurses-dev gawk csh
!sudo apt-get install sox scons libasound2-dev
!wget https://github.com/praat/praat/releases/download/v6.3.02/praat6302_linux64barren.tar.gz
!tar -xvf praat6302_linux64barren.tar.gz
!git clone --recurse https://github.com/festvox/speech_tools.git
!git clone --recurse https://github.com/festvox/festival.git
print("Compiling Festival and Speech Tools...")
%cd /content/tts/speech_tools
#!sudo apt-get update
#!sudo apt-get upgrade
!export CPPFLAGS=-UPHNALG
!./configure
!make
%cd ../festival
!./configure
!make
%cd ..

In [None]:
#@markdown ## Download SPTK
!curl -O http://kumisystems.dl.sourceforge.net/project/sp-tk/SPTK/SPTK-3.11/SPTK-3.11.tar.gz
!tar -xvf SPTK-3.11.tar.gz
%cd SPTK-3.11
!./configure --prefix=$(pwd)/build

## apply SPTK patches before compile

>Unfortunately before, we must correct some errors which would arise during compilation on newer GCC (Gnu C Compiler).

>Some much older GCC would compile our SPTK without errors, but installation of so old compiler could be problematic itself.

In [None]:
#@markdown ### patch psgr.h
%%writefile /content/tts/SPTK-3.11/bin/psgr/psgr.h
/* ----------------------------------------------------------------- */
/*             The Speech Signal Processing Toolkit (SPTK)           */
/*             developed by SPTK Working Group                       */
/*             http://sp-tk.sourceforge.net/                         */
/* ----------------------------------------------------------------- */
/*                                                                   */
/*  Copyright (c) 1984-2007  Tokyo Institute of Technology           */
/*                           Interdisciplinary Graduate School of    */
/*                           Science and Engineering                 */
/*                                                                   */
/*                1996-2017  Nagoya Institute of Technology          */
/*                           Department of Computer Science          */
/*                                                                   */
/* All rights reserved.                                              */
/*                                                                   */
/* Redistribution and use in source and binary forms, with or        */
/* without modification, are permitted provided that the following   */
/* conditions are met:                                               */
/*                                                                   */
/* - Redistributions of source code must retain the above copyright  */
/*   notice, this list of conditions and the following disclaimer.   */
/* - Redistributions in binary form must reproduce the above         */
/*   copyright notice, this list of conditions and the following     */
/*   disclaimer in the documentation and/or other materials provided */
/*   with the distribution.                                          */
/* - Neither the name of the SPTK working group nor the names of its */
/*   contributors may be used to endorse or promote products derived */
/*   from this software without specific prior written permission.   */
/*                                                                   */
/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND            */
/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,       */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF          */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE          */
/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */
/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,          */
/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED   */
/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,     */
/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */
/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,   */
/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY    */
/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE           */
/* POSSIBILITY OF SUCH DAMAGE.                                       */
/* ----------------------------------------------------------------- */
#include <stdio.h>

extern struct bbmargin {               /*  Bounding Box Margin  */
   int top;
   int bottom;
   int left;
   int right;
} bbm;

struct page_media {
   char *size;
   int width;
   int height;
};

#define PU_PT (72.0/254.0)
#define SHIFT 15
#define LAND_OFFSET 254
#define SCALE 10

#define MIN_OFFSET 12
#define MAX_OFFSET 22
#define CHAR_HEIGHT 10

#define norm(x) (int)(x)

typedef struct cord {
   int x;
   int y;
} Cord;

extern char *filename;
extern char *title;
extern char *progname;

extern struct page_media paper[];
extern char *orientations[];

extern char *media;
extern int xleng;
extern int yleng;
extern int resolution;
extern int paper_num;
extern char *orientation;

extern int psmode;
extern int landscape;
extern int font_no;
extern int clip_mode;

void epsf_setup(FILE * fp, float shrink, int xoffset, int yoffset,
                struct bbmargin bbm, int ncopy);
void epsf_end(void);
void plot(FILE * fp);
void dict(void);


In [None]:
#@markdown ### patch psgr.c
%%writefile /content/tts/SPTK-3.11/bin/psgr/psgr.c
/* ----------------------------------------------------------------- */
/*             The Speech Signal Processing Toolkit (SPTK)           */
/*             developed by SPTK Working Group                       */
/*             http://sp-tk.sourceforge.net/                         */
/* ----------------------------------------------------------------- */
/*                                                                   */
/*  Copyright (c) 1984-2007  Tokyo Institute of Technology           */
/*                           Interdisciplinary Graduate School of    */
/*                           Science and Engineering                 */
/*                                                                   */
/*                1996-2017  Nagoya Institute of Technology          */
/*                           Department of Computer Science          */
/*                                                                   */
/* All rights reserved.                                              */
/*                                                                   */
/* Redistribution and use in source and binary forms, with or        */
/* without modification, are permitted provided that the following   */
/* conditions are met:                                               */
/*                                                                   */
/* - Redistributions of source code must retain the above copyright  */
/*   notice, this list of conditions and the following disclaimer.   */
/* - Redistributions in binary form must reproduce the above         */
/*   copyright notice, this list of conditions and the following     */
/*   disclaimer in the documentation and/or other materials provided */
/*   with the distribution.                                          */
/* - Neither the name of the SPTK working group nor the names of its */
/*   contributors may be used to endorse or promote products derived */
/*   from this software without specific prior written permission.   */
/*                                                                   */
/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND            */
/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,       */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF          */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE          */
/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */
/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,          */
/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED   */
/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,     */
/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */
/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,   */
/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY    */
/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE           */
/* POSSIBILITY OF SUCH DAMAGE.                                       */
/* ----------------------------------------------------------------- */

/********************************************************
*                                                       *
*   psgr:  xy-plotter simulator for EPSF                *
*                                                       *
*      Ver. 0.95  '92.3 T.Kanno                         *
*      Ver. 0.96  '92.8                                 *
*      Ver. 0.97  '92.10                                *
*      Ver. 0.98  '93.2                                 *
*      Ver. 0.99  '93.8                                 *
********************************************************/

static char *rcs_id = "$Id$";


/*  Standard C Libraries  */
#include <stdio.h>
#include <stdlib.h>

#ifdef HAVE_STRING_H
#include <string.h>
#else
#include <strings.h>
#ifndef HAVE_STRRCHR
#define strrchr rindex
#endif
#endif

#if defined(WIN32)
#include "SPTK.h"
#else
#include <SPTK.h>
#endif

#include "psgr.h"

struct bbmargin bbm;
char *BOOL[] = { "FALSE", "TRUE" };


#define MaxPaperTypes 14        /*  Paper Media  */

struct page_media paper[] = {
   {"FALSE", 9999, 9999},
   {"Letter", 612, 792},
   {"A0", 2378, 3362},
   {"A1", 1682, 2378},
   {"A2", 1190, 1682},
   {"A3", 842, 1190},
   {"A4", 842, 842},
/* {"A4",      595,  842}, */
   {"A5", 420, 595},
   {"B0", 2917, 4124},
   {"B1", 2063, 2917},
   {"B2", 1459, 2063},
   {"B3", 1032, 1459},
   {"B4", 729, 1032},
   {"B5", 516, 729},
};

char *orientations[] = {        /*  Orientation  */
   "Portrait",
   "Landscape",
};


/* Default Values */
#define MEDIA       "FALSE"
#define ORIENTATION "Portrait"
#define PSMODE      FA
#define PAPERNUM    2
#define XLENG       595
#define YLENG       842
#define LANDSCAPE   FA
#define RESOLUTION  600
#define FONTNO      1
#define CLIPMODE    FA
#define NCOPY       1
#define XOFFSET     0
#define YOFFSET     0
#define SHRINK      1.0
#define SCALE 10


void usage(int status)
{
   fprintf(stderr, "\n");
   fprintf(stderr, " %s - XY-plotter simulator for EPSF\n\n", progname);
   fprintf(stderr, "  usage:\n");
   fprintf(stderr, "       %s [ options ] [ infile ] > stdout\n", progname);
   fprintf(stderr, "  options:\n");
   fprintf(stderr, "       -t t  : title of figure      [NULL]\n");
   fprintf(stderr, "       -s s  : shrink               [%g]\n", SHRINK);
   fprintf(stderr, "       -c c  : number of copy       [%d]\n", NCOPY);
   fprintf(stderr, "       -x x  : x offset <mm>        [%d]\n", XOFFSET);
   fprintf(stderr, "       -y y  : y offset <mm>        [%d]\n", YOFFSET);
   fprintf(stderr, "       -p p  : paper                [%s]\n", MEDIA);
   fprintf(stderr,
           "               (Letter,A0,A1,A2,A3,A4,A5,B0,B1,B2,B3,B4,B5)\n");
   fprintf(stderr, "       -l    : landscape            [%s]\n",
           BOOL[LANDSCAPE]);
   fprintf(stderr, "       -r r  : resolution           [%d dpi]\n",
           RESOLUTION);
   fprintf(stderr, "       -b    : bold mode            [FALSE]\n");
   fprintf(stderr, "       -T T  : top    margin <mm>   [%d]\n", bbm.top);
   fprintf(stderr, "       -B B  : bottom margin <mm>   [%d]\n", bbm.bottom);
   fprintf(stderr, "       -L L  : left   margin <mm>   [%d]\n", bbm.left);
   fprintf(stderr, "       -R R  : right  margin <mm>   [%d]\n", bbm.right);
   fprintf(stderr, "       -P    : output PS            [%s]\n", BOOL[PSMODE]);
   fprintf(stderr, "       -h    : print this message \n");
   fprintf(stderr, "  infile:\n");
   fprintf(stderr, "       plotter commands             [stdin]\n");
   fprintf(stderr, "  stdout:\n");
   fprintf(stderr, "       PostScript codes (EPSF)\n");
#ifdef PACKAGE_VERSION
   fprintf(stderr, "\n");
   fprintf(stderr, " SPTK: version %s\n", PACKAGE_VERSION);
   fprintf(stderr, " CVS Info: %s", rcs_id);
#endif
   fprintf(stderr, "\n");
   exit(status);
}

char *progname, *filename = NULL, *title = NULL;
char *media = MEDIA, *orientation = ORIENTATION;

int paper_num = PAPERNUM, xleng = XLENG, yleng = YLENG, resolution = RESOLUTION;
int font_no = FONTNO, psmode = PSMODE, landscape = LANDSCAPE, clip_mode =
    CLIPMODE;


int main(int argc, char *argv[])
{
   char *str, flg, c;
   FILE *fp = NULL;
   int i;
   int ncopy = NCOPY, xoffset = XOFFSET, yoffset = YOFFSET;
   float shrink = SHRINK;

   progname = *argv;
   if (strrchr(progname, '/'))
      progname = (char *) (strrchr(progname, '/') + 1);
   while (--argc) {
      if (*(str = *++argv) == '-') {
         flg = *++str;
         if ((flg != 'P' && flg != 'l' && flg != 'b')
             && *++str == '\0') {
            str = *++argv;
            argc--;
         }
         switch (flg) {
         case 'P':
            psmode = 1 - psmode;
            break;
         case 't':
            title = str;
            break;
         case 'c':
            ncopy = atoi(str);
            break;
         case 's':
            shrink = atof(str);
            break;
         case 'x':
            xoffset = atoi(str) * SCALE;
            break;
         case 'y':
            yoffset = atoi(str) * SCALE;
            break;
         case 'p':
            media = str;
            break;
         case 'l':
            landscape = 1 - landscape;
            break;
         case 'r':
            resolution = atoi(str);
            break;
         case 'T':
            bbm.top = atoi(str) * 10;
            break;
         case 'B':
            bbm.bottom = atoi(str) * 10;
            break;
         case 'L':
            bbm.left = atoi(str) * 10;
            break;
         case 'R':
            bbm.right = atoi(str) * 10;
            break;
         case 'b':
            font_no += 2;
            break;
         case 'h':
            usage(0);
            break;
         default:
            fprintf(stderr, "%s : Invalid option '%c'!\n", progname, flg);
            usage(1);
            break;
         }
      } else
         filename = str;
   }
   for (i = 0; i < MaxPaperTypes; i++) {
      if (strcmp(media, paper[i].size) == 0) {
         paper_num = i;
         break;
      }
   }
   if (!landscape) {            /*  Portrait  */
      xleng = paper[paper_num].width;
      yleng = paper[paper_num].height;
   } else {                     /*  Landscape  */
      xleng = paper[paper_num].height;
      yleng = paper[paper_num].width;
   }
   xleng = xleng * (double) SCALE / shrink;
   yleng = yleng * (double) SCALE / shrink;

   orientation = orientations[landscape];

   if (filename != NULL) {
      fp = getfp(filename, "rt");
   } else {
      fp = tmpfile();
      while ((c = getchar()) != (char) EOF)
         fputc(c, fp);
      rewind(fp);
   }

   ungetc(flg = fgetc(fp), fp);
   if (flg == (char) EOF) {
      fprintf(stderr, "%s : Input file is empty!\n", progname);
      return (-1);
   } else if (flg != '=') {
      fprintf(stderr, "%s : Unexpected data format!\n", progname);
      return (-1);
   }

   epsf_setup(fp, shrink, xoffset, yoffset, bbm, ncopy);
   plot(fp);
   epsf_end();

   fclose(fp);
   return (0);
}


# 1.1. Installation of common tools

In [None]:
#@markdown ## Compiling SPTK
!make
!make install
%cd ..

In [None]:
#@markdown ## Download and compile HTK
#@markdown ---
#@markdown ### important!
#@markdown To download HTK, you need to **[register](https://htk.eng.cam.ac.uk/)** on their website. Instructions here:
#@markdown * Fill in the registration form.
#@markdown * If your details are correct, you will receive an email with your user ID and password. That being the case, go back to this notebook and fill in these fields.
#@markdown ---
#@markdown ### HTK username:
userid = "" #@param {type:"string"}
#@markdown ---
#@markdown ### Password:
password = "" #@param {type:"string"}
#@markdown ---
%cd /content/tts
!curl -O https://"$userid":"$password"@htk.eng.cam.ac.uk/ftp/software/HTK-3.4.1.tar.gz
curl -O https://"$userid":"$password"@htk.eng.cam.ac.uk/ftp/software/hdecode/HDecode-3.4.1.tar.gz
!tar -xvzf HTK-3.4.1.tar.gz
%cd /content/tts/htk
!apt-get install gcc-multilib
!dpkg --add-architecture i386
!apt update
!apt install libx11-dev:i386
!./configure --prefix=/content/tts/htk341
!mkdir /content/tts/htk341
!sed -i '77s/        /\t/' HLMTools/Makefile
!make
!make install
!cd ..
!rm -r /content/tts/htk

In [None]:
#@markdown ## Download and compile HTS and hts_engine
#@markdown * >For RHVoice-related tasks we'll need HTS 2.3 and older 2.2.
#@markdown * >>[The HTS engine](http://hts-engine.sourceforge.net/) is software to synthesize speech waveform from HMMs trained by the HMM-based speech synthesis system (HTS).
print("Downloading and extracting...")
%cd /content/tts
!curl -O http://hts.sp.nitech.ac.jp/archives/2.3/HTS-2.3_for_HTK-3.4.1.tar.bz2
!tar -xvf HTS-2.3_for_HTK-3.4.1.tar.bz2
!curl -O http://hts.sp.nitech.ac.jp/archives/2.2/HTS-2.2_for_HTK-3.4.1.tar.bz2
!tar -xvf HTS-2.2_for_HTK-3.4.1.tar.bz2
print("Compiling HTS 2.3...")
%cd /content/tts
!tar -xvf HTK-3.4.1.tar.gz
!tar -xvf HDecode-3.4.1.tar.gz
%cd htk
!patch -p1 -d . < ../HTS-2.3_for_HTK-3.4.1.patch
!mkdir /content/tts/hts23
!./configure --prefix=/content/tts/hts23 CFLAGS="-DARCH=__linux"
!make
!make install
%cd ..
!rm -r /content/tts/htk
print("Compiling HTS 2.2...")
%cd /content/tts
!tar -xvf HTK-3.4.1.tar.gz
!tar -xvf HDecode-3.4.1.tar.gz
%cd htk
!patch -p1 -d . < ../HTS-2.2_for_HTK-3.4.1.patch
!./configure --prefix=/content/tts/hts22
!mkdir /content/tts/hts22
!make
!make install
!cd ..
!rm -r /content/tts/htk
print("Downloading hts_engine...")
%cd /content/tts
!curl -O https://kumisystems.dl.sourceforge.net/project/hts-engine/hts_engine%20API/hts_engine_API-1.10/hts_engine_API-1.10.tar.gz
!tar -xvf hts_engine_API-1.10.tar.gz
print("Compiling...")
%cd hts_engine_API-1.10
!./configure --prefix=/content/tts/hts_engine_api110
!mkdir /content/tts/hts_engine_api110
!make
!make install
!cd ..
!rm -r /content/tts/hts_engine_API-1.10

# 2. Running original HTS demo CMU ARCTIC SLT (optional).

>The goal of this part is to ensure, that all downloaded components work correctly, so we'll be able to start in RHVoice way in next part. CMU arctic SLT is one of sample voice data used by Festival speech synthesis system.

>HTS demo CMU ARCTIC SLT is a package generated from Festival's one, but prepared for easy usage in HTS. It demonstrates the process of voice generation for HTS.

In [None]:
#@markdown ## Download dataset and configure
%cd /content/tts
!curl -O http://hts.sp.nitech.ac.jp/archives/2.3/HTS-demo_CMU-ARCTIC-SLT.tar.bz2
!tar -xvf HTS-demo_CMU-ARCTIC-SLT.tar.bz2
%cd HTS-demo_CMU-ARCTIC-SLT
#Todo: patch for makefile here
!./configure --with-fest-search-path=/content/tts/festival/examples \
  --with-sptk-search-path=/content/tts/SPTK-3.11/build/bin \
  --with-hts-engine-search-path=/content/tts/hts_engine_api110/bin \
  --with-hts-search-path=/content/tts/hts23/bin

In [None]:
#@markdown ## Build statistical voice model
!make

# 3. Creating the real voice for RHVoice.

>The goal of this part is to create fully qualified voice for RHVoice.

In [None]:
#@markdown ## Mount Google Drive
#@markdown ---
#@markdown It is important to mount your Google Drive to load your datasets, as well as save your RHVoice work.
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
#@markdown ## Install Python dependencies
#@markdown ---
#@markdown Important note! When this cell finishes executing, you may need to restart the runtime. If so, restart it and once done continue to the next cell.
%cd /content/tts
!pip install matplotlib argparse cython pysoundfile

In [None]:
#@markdown ## Install dependencies and build RHVoice. (part 2)
!pip install numpy scipy webrtcvad pyworld
!sudo apt install libparallel-forkmanager-perl jq
print("Building RHVoice...")
%cd /content/tts
!git clone --recurse https://github.com/RHVoice/RHVoice.git
%cd RHVoice
!scons dev=True
%cd ..

In [None]:
#@markdown ## Extract dataset and init working directory
#@markdown ---
#@markdown Here you can set the path of the audio dataset (zipped). It should be noted that this data set must contain:
#@markdown * >wav/: audio recordings in wav format. Each file contains short phrase.
#@markdown * >etc/arctic.data, etc/txt.done.data: transcript of recording as text, both files are identical.
#@markdown ---
%cd /content/tts
#@markdown ### What is the dataset path? (zip format)
#@markdown For example: /content/drive/MyDrive/myvoices/testvoice/dataset.zip
dataset_path = "/content/drive/MyDrive/myvoices/testvoice/dataset.zip" #@param {type:"string"}
#@markdown ---
import zipfile
import os
if zipfile.is_zipfile(dataset_path):
  !unzip "$dataset_path"
else:
  print("Warning: the audio path is not a compressed file.")
print("Done. initializing working directory...")
%cd /content/tts
!mkdir rhwork
%cd rhwork
!../RHVoice/src/scripts/general/voice-building-utils  init
print("Configuring common bin directory...")
%cd /content/tts
!mkdir bin
%cd bin
#create symbolinc links:
!cp --symbolic-link ../SPTK-3.11/build/bin/* ./
!cp --symbolic-link ../hts_engine_api110/bin/* ./
!cp --symbolic-link ../hts23/bin/* ./
%cd ../rhwork
# update training.cfg
!jq --arg pwd "/content/tts" --arg tts "/content/tts"  '.festdir=$tts+"/festival/"|.bindir=$tts+"/bin"|.htk_bindir=$tts+"/htk341/bin"|.praat_path=$tts+"/praat"|.f0_method="praat_ac|sptk_reaper"|.hts22_bindir=$tts+"/hts22/bin"'  training.cfg >training2.cfg &&mv training2.cfg training.cfg
# prepare wav folder:
!mkdir wav
%cd wav
!cp -s /content/tts/wav/* ./
%cd ..

In [None]:
#@markdown ## Model settings and configure
#@markdown ### speaker name
speaker_name = "myvoice" #@param {type:"string"}
#@markdown ---
#@markdown ### Language with which you will train
language = "English" #@param ["Albanian", "Brazilian-Portuguese", "English", "Esperanto", "Georgian", "Kyrgyz", "Macedonian", "Russian", "Tatar", "Ukrainian"]
#@markdown ---
#@markdown ### Choose voice gender
gender = "male" #@param ["female", "male"]
#@markdown ---
!jq --arg pwd "/content/tts"  '.wavedir=$pwd+"/wav"|.speaker="$speaker_name"|.language="$language"|.gender="$gender"'  training.cfg >training2.cfg &&mv training2.cfg training.cfg
!../RHVoice/src/scripts/general/voice-building-utils  configure

In [None]:
#@markdown ## Import recordings
%cd /content/tts
!mv /content/tts/praat_barren /content/tts/praat
!chmod +x praat
%cd rhwork
!../RHVoice/src/scripts/general/voice-building-utils import-recordings

In [None]:
#@markdown ## Define f0 range
#@markdown >F0 parameter is the base frequency of speaker's voice. It's changing in time due to intonation.

#@markdown >One of "features" step in HTS-demo was extraction of lf0 (which is derived from F0 itself). To have success in this process, the possible range of F0 must be defined.

../RHVoice/src/scripts/general/voice-building-utils guess-f0-range

In [None]:
#@markdown ## Extract LF0
!../RHVoice/src/scripts/general/voice-building-utils extract-f0

In [None]:
#@markdown ## Extract bap
!../RHVoice/src/scripts/general/voice-building-utils extract-bap

In [None]:
#@markdown ## MGC generation/rest of analysis
#@markdown ### Choose audio quality
audio_quality = "high" #@param ["high", "low"]
#@markdown ---
if audio_quality == "low":
    %cd data
    !make analysis
else:
    !../RHVoice/src/scripts/general/voice-building-utils extract-mgc
    %cd data
    !make cmp
%cd ..

In [None]:
#@markdown ## Resynthetize and display audio results
#@markdown >It sounds similar to the resulting voice synthesizer which will be created.

#@markdown >Some clicks and short beeps, are artifacts related to resynthesis, and will not be present in generated voice.

print("Resynthetizing...")
!../RHVoice/src/scripts/general/voice-building-utils synth
%matplotlib inline
import os
from IPython.display import Audio, display
# folder path
dir_path = '/content/tts/rhwork/data/synth'
# list to store files
files = []
# Iterate directory
for path in os.listdir(dir_path):
    # check if current path is a file
    if os.path.isfile(os.path.join(dir_path, path)):
        files.append(path)
print("Results:")
for x in range(1, 5):
    Audio(dir_path+"/"+files[x], rate = 24000)
print("Done")

In [None]:
#@markdown ## Convert transcript to ssml and create test
#@markdown ---
#@markdown ### Write a sample text
sampletext = "This is just a test." #@param {type:"string"}
#@markdown ---
print("Making transcript...")
!python /content/tts/RHVoice/src/scripts/general/text2ssml.py /content/tts/txt.done.data -l "$language" /content/tts/prompts.ssml
# create test:
%%writefile /content/tts/rhwork/test.ssml
<speak xml:lang="{language}">
<s>{sampletext}</s>
</speak>

In [None]:
#@markdown ## Configure SSML files, segment and label
!jq --arg pwd "/content/tts"  '.text=$pwd+"/prompts.ssml"|.test=$pwd+"/test.ssml"'  training.cfg >training2.cfg &&mv training2.cfg training.cfg
print("Segmenting...")
!../RHVoice/src/scripts/general/voice-building-utils segment
print("Labelling...")
../RHVoice/src/scripts/general/voice-building-utils label

In [None]:
#@markdown ## Make questions
!../RHVoice/src/scripts/general/voice-building-utils make-questions

In [None]:
#@markdown ## Create LPF
!../RHVoice/src/scripts/general/voice-building-utils make-lpf

In [None]:
#@markdown # Lets train!
!make voice

In [None]:
#@markdown ## Export voice
../RHVoice/src/scripts/general/voice-building-utils export-voice

In [None]:
#@markdown ## Test your voice (beta)

#@markdown Warning: This might not work with Google Colab, unless you have pro and therefore an extended duration of backend usage.

#@markdown ---
text = "This is just a simple test of this speech synthesis." #@param {type:"string"}
!echo "$text" | ../RHVoice/local/bin/RHVoice-test -p Myvoice -o ./test.wav

In [None]:
#@markdown ## Improve voice quality (optional)

#@markdown This will create a new version of the voice by modifying the labels and other things. Therefore, after executing this cell we need to iterate over some previous cells:

#@markdown * Make LPF
#@markdown * Train
#@markdown * Export voice
#@markdown ---

!../RHVoice/src/scripts/general/voice-building-utils realign

# reference

(Creating a new voice for RHVoice. Grzezlo, update 09/12/2022): https://github.com/RHVoice/RHVoice/wiki/Creating-a-new-voice-for-RHVoice.