Skip to content
This repository has been archived by the owner on Jul 5, 2023. It is now read-only.
/ Voice2Voice Public archive

A Python neural network made with TensorFlow that converts one person's voice into another.

Notifications You must be signed in to change notification settings

AdinAck/Voice2Voice

Repository files navigation

Description

A Python neural network made with TensorFlow that converts one person's voice into another. The network is trained on audio files of person A's voice that person B needs to replicate to the best degree.


Setup

Prerequisites

  • Python 3.7 or greater (64-bit)
  • Python package requirements
  • NVIDIA Graphics card
  • Required Drivers and Development Tools

1. Package Requirements

To install the required python libraries, simply execute the following command in the repository working directory.

pip install -r requirements.txt

2. (Highly Recommended) Graphics Card Utilization

A) Required Drivers and Development Tools

  • Latest NVIDIA GPU Drivers
  • CUDA Toolkit (v11.0 Update 1)
  • cuDNN SDK 8.0.4 for CUDA Toolkit v11.0

B) cuDNN

Create a folder named tools under C:\. Drag the cuda folder from the cuDNN zip into the tools folder.

C) Appending %PATH%

Add the following paths to the PATH system environment variable:

  • C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.0\\bin
  • C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.0\\extras\\CUPTI\\lib64
  • C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.0\\include
  • C:\\tools\\cuda\\bin

3. (Optional) Configuration

In the config.ini file, there are attributes to be changed for configuring the model.

Section Parameter Description
MISC modelName The name of the model to be created or loaded.
verbose If set to True, additional information will be printed while running, along with additional files for easy debugging.
Structure sliceSize The number of time samples to be used in the input. (if sliceSize exceeds the length of an audio clip, the audio clip will be omitted from the training data)
hiddenLayers A list of integers defining the size of each hidden layer. (should be formatted like so: a,b,c,d or a)
Advanced learningRate The learning rate for the Adam optimizer. Tensorflow Documentation
lossFunc The loss function. Tensorflow Documentation
batchSize The batch size. Tensorflow Documentation

4. Data

Create folders

Voice2Voice.py -l

Running load_data for the first time creates the training and use folders.

Populate folders

Supplying data

  • Only wav files are supported.
  • Training files must be in order, corresponding with the files in the other folder.

Training/Input

Place person A's audio recordings into the folder.

Training/Output

Place person B's audio recordings in an order corresponding to person A's audio recordings. (Recommended naming example: "helloJohn.wav")

Use Folder

Place files to be used by model to perform voice conversion. (Suggest using training files as preliminary testing of model)

Convert

Voice2Voice.py -l

Running load_data for the second time converts the contents of inputs, outputs, and use into processable files.

5. Train Model

Voice2Voice.py -t

If let Run for 10,000 epochs or <Ctrl + C> is pressed, the model will be saved DO NOT CLOSE TERMINAL UNTIL MODEL SAVED.

6. Prediction Time!!!

Voice2Voice.py -p

Use the model assigned in config.ini to convert voices from use folder and place them in output.

Usage

Voice2Voice.py [-l | --load_data] [-f | --flush_data] [-t | --train] [-p | --predict]

Argument Description
[-l | load_data] Load audio files in training and use folders to be usable by the model. This will also create the training and use folders if they are not present.
[-f | flush_data] Delete all converted data.
[-t | --train] Create a new model to be trained, or continue training an existing model (dependent on the modelName attribute in config.ini). Exit training and save model by interrupting the process <Ctrl + C>.
[-p | --predict] Load model specified by modelName in config.ini and predict audio output given audio files in use folder.

About

A Python neural network made with TensorFlow that converts one person's voice into another.

Topics

Resources

Stars

Watchers

Forks

Languages