docs: switch to mkdocs

EveryVoiceTTS · Sep 7, 2023 · c5d9ff7 · c5d9ff7
1 parent fbf010e
commit c5d9ff7
Show file tree

Hide file tree

Showing 28 changed files with 479 additions and 631 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,40 @@
+name: Deploy docs
+on:
+  push:
+    branches:
+      - main
+jobs:
+ docs:
+    # Create latest docs
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          token: ${{ secrets.SGILE_PAT }}
+          submodules: recursive
+          fetch-depth: 0 # fetch all commits/branches
+      - name: Set up Conda
+        uses: conda-incubator/setup-miniconda@v2
+        with:
+          python-version: 3.9
+      - name: Install libsndfile
+        run: sudo apt-get install -y libsndfile1
+      - name: Install Torch related deps
+        run: |
+          conda install -c conda-forge pycountry pyworld
+          pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
+      - name: Install other dependencies
+        run: |
+          pip install coverage soundfile
+          pip install -e .
+      - name: Install documentation dependencies
+        run: |
+          pip install -r docs/requirements.txt
+      - name: Setup doc deploy
+        run: |
+            git config user.name 'github-actions[bot]'
+            git config user.email 'github-actions[bot]@users.noreply.github.com'
+      - name: Deploy docs with mike 🚀
+        run: |
+          mkdocs build
+          mike deploy --push --update-aliases latest
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -32,20 +32,10 @@ jobs:
         run: |
           pip install coverage soundfile
           pip install -e .
-          cd docs && pip install -r requirements.txt
       - name: Check licenses
         run: |
           pip install pip-licenses
           if pip-licenses | grep -E -v 'Artistic License|LGPL|Public Domain' | grep GNU; then echo 'Please avoid introducing *GPL dependencies'; false; fi
-      - name: Docs
-        run: |
-          cd docs && make html
-      - name: Deploy 🚀
-        if: github.ref == 'refs/heads/main'  # only publish the docs from main
-        uses: JamesIves/github-pages-deploy-action@v4
-        with:
-          branch: gh-pages # The branch the action should deploy to.
-          folder: docs/build/html # The folder the action should deploy.
       - name: Run tests
         run: |
           cd everyvoice && coverage run run_tests.py dev

diff --git a/docs/.nojekyll b/docs/.nojekyll
diff --git a/docs/Makefile b/docs/Makefile
diff --git a/docs/guides/background.md b/docs/guides/background.md
@@ -0,0 +1 @@
+# Background to Text-to-Speech
diff --git a/docs/guides/custom.md b/docs/guides/custom.md
@@ -0,0 +1,119 @@
+# Customize to your language
+
+## Step 1: Make sure you have Permission!
+
+So, you want to build a text-to-speech system for a new language or dataset - cool! But, just because you **can** build a text-to-speech system, doesn't mean you **should**. There are a lot of tricky ethical
+questions around text-to-speech. It's not ethical to just use audio you find somewhere if it doesn't have explicit permission to use it for the purposes of text-to-speech. The first step is to make sure you have
+permission to use the data in question and that whoever contributed their voice to the data you want to use is aware and supportive of your goal.
+
+## Step 2: Gather Your Data
+
+The first thing to do is to get all the data you have (in this case audio with text transcripts) together in one place. Your audio should be in 'wav' format. Ideally it would be 16bit, mono (one channel) audio sampled somewhere between 22.05kHz and 48kHz. If that doesn't mean anything to you, don't worry, we can ensure the right format in later steps.
+It's best if your audio clips are somewhere between half a second and 10 seconds long. Any longer and it could be difficult to train. If your audio is longer than this, we suggest processing it into smaller chunks first.
+
+Your text should be consistently written and should be in a pipe-separated values spreadsheet, similar to [this file](https://github.com/roedoejet/EveryVoice/blob/main/everyvoice/filelists/lj_full.psv). It should have a column that contains text and a column that contains the `basename` of your associated audio file. So if you have a recording of somebody saying "hello how are you?" and the corresponding audio is called `mydata0001.wav`
+then you should have a psv file that looks like this:
+
+```csv hl_lines="2"
+
+basename|text
+mydata0001|hello how are you?
+mydata0002|some other sentence.
+...
+```
+
+We also support comma and tab separated files, but recommend using pipes (|).
+
+You can also use the "festival" format which is like this (example from [Sinhala TTS](https://openslr.org/30/)):
+
+```text
+( sin_2241_0329430812 " කෝකටත් මං වෙනදා තරම් කාලෙ ගන්නැතිව ඇඳ ගත්තා " )
+( sin_2241_0598895166 " ඇන්ජලීනා ජොලී කියන්නේ පසුගිය දිනවල බොහෝ සෙයින් කතා බහට ලක්වූ චරිතයක් " )
+( sin_2241_0701577369 " ආර්ථික චින්තනය හා සාමාජීය දියුණුව ඇති කළ හැකිවනුයේ පුද්ගල ආර්ථික දියුණුව සලසා දීමෙන්ය " )
+( sin_2241_0715400935 " ඉන් අදහස් වන්නේ විචාරාත්මක විනිවිද දැකීමෙන් තොර බැල්මයි " )
+( sin_2241_0817100025 " අප යුද්ධයේ පළමු පියවරේදීම පරාද වී අවසානය " )
+```
+
+In this format, there are corresponding wav files labelled sin_2241_0329430812.wav etc..
+
+## Step 3: Install EveryVoice
+
+Head over to the [install documentation](../install.md) and install EveryVoice
+
+## Step 4: Run the Configuration Wizard 🧙
+
+Once you have your data, the best thing to do is to run the Configuration Wizard 🧙. To do that run:
+
+```bash
+everyvoice config-wizard
+```
+
+After running the config-wizard, cd into your newly created directory. Let's call it `test` for now.
+
+```bash
+cd test
+```
+
+## Step 5: Run the Preprocessor
+
+Your models need to do a number of preprocessing steps in order to prepare for training. To preprocess everything you need, run the following:
+
+```bash
+everyvoice fs2 preprocess -p config/feature_prediction.yaml
+```
+
+## Step 6: Train your Vocoder
+
+```bash
+everyvoice hifigan train -p config/vocoder.yaml
+```
+
+By default, we run our training with PyTorch Lightning's "auto" strategy. But, if you are on a machine where you know the hardware, you can specify it like:
+
+```bash
+everyvoice hifigan train -p config/vocoder.yaml -d 1 -a gpu
+```
+
+Which would use the GPU accelerator and specify 1 device/chip.
+
+## Step 7: Train your Feature Prediction Network
+
+To generate audio when you train your feature prediction network, you need to add your vocoder checkpoint to the `config/feature_prediction.yaml`
+
+At the bottom of that file you'll find a key called vocoder_path. Add the absolute path to your trained vocder (here it would be `/path/to/test/logs/VocoderExperiment/base/checkpoints/last.ckpt` where `/path/to` would be the actual path to it on your computer.)
+
+Once you've replaced the vocoder_path key, you can train your feature prediction network:
+
+```bash
+everyvoice fs2 train -p config/feature_prediction.yaml
+```
+
+## Step 8: Synthesize Speech in Your Language!
+
+You can synthesize by pointing the CLI to your trained feature prediction network and passing in the text. You can export to wav, npy, or pt files.
+
+```bash
+everyvoice fs2 synthesize logs/FeaturePredictionExperiment/base/checkpoints/last.ckpt -t "මෙදා සැරේ සාකච්ඡාවක් විදියට නෙවෙයි නේද පල කරල තියෙන්නෙ" -a gpu -d 1 -O wav
+```
+
+<!-- % Step 10 (optional): Finetune your vocoder
+
+% ----------------------------------------
+
+% .. code-block:: bash
+
+% everyvoice e2e train -p config/e2e.yaml
+
+% Step 11: Synthesize Speech
+
+% --------------------------
+
+% .. code-block:: bash
+
+% everyvoice e2e synthesize -t "hello world" -c config/e2e.yaml
+
+% .. warning::
+
+% TODO: this doesn't exist yet
+
+% TODO: e2e needs checkpoint paths -->
diff --git a/docs/guides/finetune.md b/docs/guides/finetune.md
@@ -0,0 +1 @@
+# How to fine-tune the existing checkpoints
diff --git a/docs/guides/index.md b/docs/guides/index.md
@@ -0,0 +1,7 @@
+# Guides
+
+Here are a selection of guides to help you through the process of training and using your own text-to-speech models.
+
+1. Background to TTS
+
+2. Custom
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,16 @@
+% EveryVoice documentation master file, created by
+% sphinx-quickstart on Mon Dec  5 17:51:09 2022.
+% You can adapt this file completely to your liking, but it should at least
+% contain the root `toctree` directive.
+
+# Welcome to EveryVoice's documentation!
+
+```{toctree}
+:caption: 'Contents:'
+:maxdepth: 2
+
+start
+install
+guides/index
+reference/index
+```
diff --git a/docs/install.md b/docs/install.md
@@ -0,0 +1,18 @@
+# Installation
+
+In order to train on GPUs, you must install PyTorch and Cuda. To do so, we recommend:
+
+- installing [conda](https://docs.conda.io/projects/conda/en/stable/) or [miniconda](https://docs.conda.io/en/latest/miniconda.html)
+- creating a new environment: `conda create --name EveryVoice python=3.9`
+- activating the environment: `conda activate EveryVoice`
+- following the [PyTorch installation instructions](https://pytorch.org/get-started/locally/) relevant to your hardware
+
+We then recommend using an interactive installation after cloning the repo from GitHub:
+
+```bash
+$ git clone https://github.com/roedoejet/EveryVoice.git
+$ cd EveryVoice
+$ git submodule update --init
+$ conda activate EveryVoice
+$ pip install -e .
+```
diff --git a/docs/make.bat b/docs/make.bat
diff --git a/docs/overrides/partials/comments.html b/docs/overrides/partials/comments.html
@@ -0,0 +1,39 @@
+{% if page.meta.comments %}
+<h2 id="__comments">{{ lang.t("meta.comments") }}</h2>
+<!-- Insert generated snippet here -->
+
+<!-- Synchronize Giscus theme with palette -->
+<script>
+    var giscus = document.querySelector("script[src*=giscus]")
+
+    /* Set palette on initial load */
+    var palette = __md_get("__palette")
+    if (palette && typeof palette.color === "object") {
+        var theme = palette.color.scheme === "slate" ? "dark" : "light"
+        giscus.setAttribute("data-theme", theme)
+    }
+
+    /* Register event handlers after documented loaded */
+    document.addEventListener("DOMContentLoaded", function() {
+        var ref = document.querySelector("[data-md-component=palette]")
+        ref.addEventListener("change", function() {
+            var palette = __md_get("__palette")
+            if (palette && typeof palette.color === "object") {
+                var theme = palette.color.scheme === "slate" ? "dark" : "light"
+
+                /* Instruct Giscus to change theme */
+                var frame = document.querySelector(".giscus-frame")
+                frame.contentWindow.postMessage({
+                        giscus: {
+                            setConfig: {
+                                theme
+                            }
+                        }
+                    },
+                    "https://giscus.app"
+                )
+            }
+        })
+    })
+</script>
+{% endif %}