Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
$Id: README.txt,v 1.8 2002/09/20 02:30:51 emanuel Exp $ ======================================================================== Zerius Vocoder 1.3 README ------------------------------------------------------------------------ Emanuel Borsboom <firstname.lastname@example.org> September 2002 ======================================================================== This program is used to make speech sound cool. It imposes the vocal effects of speech onto another sound. This technique has been made popular by artists such as Kraftwerk and Laurie Anderson. Note: This manual applies only to the command-line version. If you are using the GUI version, see the help file that is included with it. ------------ Installation ------------ If you are using the C source distribution, the first step is to compile the vocoder. Uncompress the archive and glance over the Makefile, making sure the variables are all right. The defaults should work for most UNIX environments. Also check config.h and make sure that S32, U32, S16, and U16 are defined appropriately as signed and unsigned 32-bit and 16-bit integers, respectively. The defaults should be fine for most 32-bit platforms. Once finished, run 'make' to compile the vocoder. Once you have the executable (or if you have downloaded a binary), you can copy it to the directory where you keep your binaries, or just run it where it is. ----- Usage ----- There are two ways to run the vocoder. If it is run without any command-line arguments (by clicking on its icon in Windows, for example) it will ask you for the values of the parameters. The meanings of the parameters follow in the next section. To specify tho parameters on the command-line, use the following syntax: vocoder [-q] [-N] [-b <band-count>] [-w <window-length] [-o <window-overlap>] [-v <volume>] <modulator-file> <carrier-file> <output-file> (Note: this version also supports the version 1.0 syntax in order to be compatible with already existing front ends). ---------- Parameters ---------- A detailed explanation of what these parameters mean is in the next section. Modulator filename (<modulator-file>) the path to a sound file that contains the modulator waveform (required). Carrier filename (<carrier-file>) the path to a sound file that contains the carrier waveform (required). Window length (-w <window-length>) the number of samples that will be analyzed at a time, and must be a power of two (defaults to about 1/15th of a second worth of samples). Window overlap (-o <window-overlap>) the number of samples that the windows will be overlapped (defaults to one half of the window-length). Band count (-b <band-count>) the number of frequency bands that the carrier will be modulated with (defaults to 16). Output volume (-v <volume>) the volume the output will be scaled by (defaults to 1.0). Output filename (output-file) is the path to the output sound file (required). These options are only available on the command-line: -N turns off normalizing the output with respect to the carrier. -q turns off any displays. The input sound files must be mono, 8- or 16-bit linear, uncompressed AIFF or WAVE files. The output sound file will have the same format as the modulator (regardless of the file extension you give it). ----------- Explanation ----------- This channel vocoder works by analyzing the frequencies in the modulator, splitting them into bands, finding the magnitude of each band, and then amplifying the corresponding bands of the carrier by that magnitude. The modulator should simply be speech. It works best of you speak very clearly and more slowly than usual. The carrier should be some kind of frequency rich waveform. White noise works well. Periodic white noise (i.e. a very short sample of white noise) gives a "robot-like" sound. Another one that sounds good is a synthesized string chord. This waveform will automatically be looped. You can get interesting results by having the waveform change over time. Since what you pronounce changes over time, it would be pointless to analyze the entire modulator waveform and excite those frequencies in the carrier at once. Instead, the program splits the modulator into "windows", which it processes one-at-a-time. The window-length specifies how many samples are in each window. You will want at least a few windows for every syllable. If this number is too large, the output will be not be very understandable. If it is too small, you will have other problems. Around 1/15th of a second (or the sampling rate of the sound file divided by 15) tends to sound good, but experiment to find the right value. To give you an example, anywhere from 512 to 2048 is okay for a modulator with a sampling rate of 44.1 khz. If you half the sampling rate, you should half the window-length, etc. The window-length must be a power of two due to the technique that us used to analyze the frequencies. For those of you who are unfamiliar with the term "power of two," it means a number that can be created by multiplying some number of two's together. For example, the following numbers are the powers of two up to 4096: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 You get the next power of two by doubling the previous one. Since the sound is processed in discrete windows, the output can change very abruptly where it goes from one chunk to the next. This is audible as a click. To remedy this, the program can have the windows overlap and cross-fade between them. The window-overlap specifies how many samples of overlap there are between windows. 1/8th of the window-length tends to be a good starting point, but in many cases, one half of the window-length gives the best results. This may not exceed half of the window-length. In order to excite the frequencies in the carrier, the frequencies of the modulator are split into bands. The larger your band-count, the more the output will sound like the modulator. This number should evenly divide the chunk-length for the best results. Somewhere between 8 and 64 usually sounds best. The band-count may not exceed half of the window-length. If you find that the output is clipped (distorted) or is too quiet, you can specify a value for the volume. Anything less than one will reduce the volume, and anything greater than one will increase it. While the defaults for the parameters generally produce decent results, the best results will be achieved by changing their values. The best way figure out all the numbers and what the best waveforms are is to experiment. Have fun! ---------------- Closing Comments ---------------- Please see the web site at http://www.nuel.ca/Vocoder for the latest information. The latest version will always be available from there. If you have any problems, don't hesitate to contact me. I am always pleased to help. Also, drop me a line if like this program, or have any suggestions. I am especially eager to hear your creations. If you release some music utilizing the vocoder, please tell me so I can try to find it (freebies are always accepted)! My e-mail address is email@example.com. Chanks to Cody Jones <firstname.lastname@example.org> for porting to MacOS. I appreciate any bug reports. --------- Copyright --------- The Zerius Vocoder is Copyright (C) 1996-1999, 2002 Emanuel Borsboom. The FFT code (contained in fftn.c, fftaux.c, fft.h, and spt.h) is Copyright (C) 1993 Steven Trainoff. The code for converting to and from IEEE floating-point numbers is Copyright (C) 1988-1991 Apple Computer Inc. You are free to do whatever you like with the vocoder, as long as the copyright notice stays intact and you note any changes. There is no warranty.