Skip to content

Commit

Permalink
docs: switch to mkdocs
Browse files Browse the repository at this point in the history
  • Loading branch information
roedoejet committed Sep 7, 2023
1 parent fbf010e commit c5d9ff7
Show file tree
Hide file tree
Showing 28 changed files with 479 additions and 631 deletions.
40 changes: 40 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Deploy docs
on:
push:
branches:
- main
jobs:
docs:
# Create latest docs
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
token: ${{ secrets.SGILE_PAT }}
submodules: recursive
fetch-depth: 0 # fetch all commits/branches
- name: Set up Conda
uses: conda-incubator/setup-miniconda@v2
with:
python-version: 3.9
- name: Install libsndfile
run: sudo apt-get install -y libsndfile1
- name: Install Torch related deps
run: |
conda install -c conda-forge pycountry pyworld
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
- name: Install other dependencies
run: |
pip install coverage soundfile
pip install -e .
- name: Install documentation dependencies
run: |
pip install -r docs/requirements.txt
- name: Setup doc deploy
run: |
git config user.name 'github-actions[bot]'
git config user.email 'github-actions[bot]@users.noreply.github.com'
- name: Deploy docs with mike 🚀
run: |
mkdocs build
mike deploy --push --update-aliases latest
10 changes: 0 additions & 10 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,10 @@ jobs:
run: |
pip install coverage soundfile
pip install -e .
cd docs && pip install -r requirements.txt
- name: Check licenses
run: |
pip install pip-licenses
if pip-licenses | grep -E -v 'Artistic License|LGPL|Public Domain' | grep GNU; then echo 'Please avoid introducing *GPL dependencies'; false; fi
- name: Docs
run: |
cd docs && make html
- name: Deploy 🚀
if: github.ref == 'refs/heads/main' # only publish the docs from main
uses: JamesIves/github-pages-deploy-action@v4
with:
branch: gh-pages # The branch the action should deploy to.
folder: docs/build/html # The folder the action should deploy.
- name: Run tests
run: |
cd everyvoice && coverage run run_tests.py dev
Expand Down
Empty file removed docs/.nojekyll
Empty file.
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

1 change: 1 addition & 0 deletions docs/guides/background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Background to Text-to-Speech
119 changes: 119 additions & 0 deletions docs/guides/custom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Customize to your language

## Step 1: Make sure you have Permission!

So, you want to build a text-to-speech system for a new language or dataset - cool! But, just because you **can** build a text-to-speech system, doesn't mean you **should**. There are a lot of tricky ethical
questions around text-to-speech. It's not ethical to just use audio you find somewhere if it doesn't have explicit permission to use it for the purposes of text-to-speech. The first step is to make sure you have
permission to use the data in question and that whoever contributed their voice to the data you want to use is aware and supportive of your goal.

## Step 2: Gather Your Data

The first thing to do is to get all the data you have (in this case audio with text transcripts) together in one place. Your audio should be in 'wav' format. Ideally it would be 16bit, mono (one channel) audio sampled somewhere between 22.05kHz and 48kHz. If that doesn't mean anything to you, don't worry, we can ensure the right format in later steps.
It's best if your audio clips are somewhere between half a second and 10 seconds long. Any longer and it could be difficult to train. If your audio is longer than this, we suggest processing it into smaller chunks first.

Your text should be consistently written and should be in a pipe-separated values spreadsheet, similar to [this file](https://github.com/roedoejet/EveryVoice/blob/main/everyvoice/filelists/lj_full.psv). It should have a column that contains text and a column that contains the `basename` of your associated audio file. So if you have a recording of somebody saying "hello how are you?" and the corresponding audio is called `mydata0001.wav`
then you should have a psv file that looks like this:

```csv hl_lines="2"
basename|text
mydata0001|hello how are you?
mydata0002|some other sentence.
...
```

We also support comma and tab separated files, but recommend using pipes (|).

You can also use the "festival" format which is like this (example from [Sinhala TTS](https://openslr.org/30/)):

```text
( sin_2241_0329430812 " කෝකටත් මං වෙනදා තරම් කාලෙ ගන්නැතිව ඇඳ ගත්තා " )
( sin_2241_0598895166 " ඇන්ජලීනා ජොලී කියන්නේ පසුගිය දිනවල බොහෝ සෙයින් කතා බහට ලක්වූ චරිතයක් " )
( sin_2241_0701577369 " ආර්ථික චින්තනය හා සාමාජීය දියුණුව ඇති කළ හැකිවනුයේ පුද්ගල ආර්ථික දියුණුව සලසා දීමෙන්ය " )
( sin_2241_0715400935 " ඉන් අදහස් වන්නේ විචාරාත්මක විනිවිද දැකීමෙන් තොර බැල්මයි " )
( sin_2241_0817100025 " අප යුද්ධයේ පළමු පියවරේදීම පරාද වී අවසානය " )
```

In this format, there are corresponding wav files labelled sin_2241_0329430812.wav etc..

## Step 3: Install EveryVoice

Head over to the [install documentation](../install.md) and install EveryVoice

## Step 4: Run the Configuration Wizard 🧙

Once you have your data, the best thing to do is to run the Configuration Wizard 🧙. To do that run:

```bash
everyvoice config-wizard
```

After running the config-wizard, cd into your newly created directory. Let's call it `test` for now.

```bash
cd test
```

## Step 5: Run the Preprocessor

Your models need to do a number of preprocessing steps in order to prepare for training. To preprocess everything you need, run the following:

```bash
everyvoice fs2 preprocess -p config/feature_prediction.yaml
```

## Step 6: Train your Vocoder

```bash
everyvoice hifigan train -p config/vocoder.yaml
```

By default, we run our training with PyTorch Lightning's "auto" strategy. But, if you are on a machine where you know the hardware, you can specify it like:

```bash
everyvoice hifigan train -p config/vocoder.yaml -d 1 -a gpu
```

Which would use the GPU accelerator and specify 1 device/chip.

## Step 7: Train your Feature Prediction Network

To generate audio when you train your feature prediction network, you need to add your vocoder checkpoint to the `config/feature_prediction.yaml`

At the bottom of that file you'll find a key called vocoder_path. Add the absolute path to your trained vocder (here it would be `/path/to/test/logs/VocoderExperiment/base/checkpoints/last.ckpt` where `/path/to` would be the actual path to it on your computer.)

Once you've replaced the vocoder_path key, you can train your feature prediction network:

```bash
everyvoice fs2 train -p config/feature_prediction.yaml
```

## Step 8: Synthesize Speech in Your Language!

You can synthesize by pointing the CLI to your trained feature prediction network and passing in the text. You can export to wav, npy, or pt files.

```bash
everyvoice fs2 synthesize logs/FeaturePredictionExperiment/base/checkpoints/last.ckpt -t "මෙදා සැරේ සාකච්ඡාවක් විදියට නෙවෙයි නේද පල කරල තියෙන්නෙ" -a gpu -d 1 -O wav
```

<!-- % Step 10 (optional): Finetune your vocoder
% ----------------------------------------
% .. code-block:: bash
% everyvoice e2e train -p config/e2e.yaml
% Step 11: Synthesize Speech
% --------------------------
% .. code-block:: bash
% everyvoice e2e synthesize -t "hello world" -c config/e2e.yaml
% .. warning::
% TODO: this doesn't exist yet
% TODO: e2e needs checkpoint paths -->
1 change: 1 addition & 0 deletions docs/guides/finetune.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# How to fine-tune the existing checkpoints
7 changes: 7 additions & 0 deletions docs/guides/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Guides

Here are a selection of guides to help you through the process of training and using your own text-to-speech models.

1. Background to TTS

2. Custom
16 changes: 16 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
% EveryVoice documentation master file, created by
% sphinx-quickstart on Mon Dec 5 17:51:09 2022.
% You can adapt this file completely to your liking, but it should at least
% contain the root `toctree` directive.

# Welcome to EveryVoice's documentation!

```{toctree}
:caption: 'Contents:'
:maxdepth: 2
start
install
guides/index
reference/index
```
18 changes: 18 additions & 0 deletions docs/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Installation

In order to train on GPUs, you must install PyTorch and Cuda. To do so, we recommend:

- installing [conda](https://docs.conda.io/projects/conda/en/stable/) or [miniconda](https://docs.conda.io/en/latest/miniconda.html)
- creating a new environment: `conda create --name EveryVoice python=3.9`
- activating the environment: `conda activate EveryVoice`
- following the [PyTorch installation instructions](https://pytorch.org/get-started/locally/) relevant to your hardware

We then recommend using an interactive installation after cloning the repo from GitHub:

```bash
$ git clone https://github.com/roedoejet/EveryVoice.git
$ cd EveryVoice
$ git submodule update --init
$ conda activate EveryVoice
$ pip install -e .
```
36 changes: 0 additions & 36 deletions docs/make.bat

This file was deleted.

39 changes: 39 additions & 0 deletions docs/overrides/partials/comments.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{% if page.meta.comments %}
<h2 id="__comments">{{ lang.t("meta.comments") }}</h2>
<!-- Insert generated snippet here -->

<!-- Synchronize Giscus theme with palette -->
<script>
var giscus = document.querySelector("script[src*=giscus]")

/* Set palette on initial load */
var palette = __md_get("__palette")
if (palette && typeof palette.color === "object") {
var theme = palette.color.scheme === "slate" ? "dark" : "light"
giscus.setAttribute("data-theme", theme)
}

/* Register event handlers after documented loaded */
document.addEventListener("DOMContentLoaded", function() {
var ref = document.querySelector("[data-md-component=palette]")
ref.addEventListener("change", function() {
var palette = __md_get("__palette")
if (palette && typeof palette.color === "object") {
var theme = palette.color.scheme === "slate" ? "dark" : "light"

/* Instruct Giscus to change theme */
var frame = document.querySelector(".giscus-frame")
frame.contentWindow.postMessage({
giscus: {
setConfig: {
theme
}
}
},
"https://giscus.app"
)
}
})
})
</script>
{% endif %}
Loading

0 comments on commit c5d9ff7

Please sign in to comment.