Phonetic Corpus of Audiobooks

The Phonetic Corpus of Audiobooks (PCA) is a linguistic corpus that can be used for phonetic or acoustic research on speech and articulation. The corpus includes audio recordings and corresponding text versions of audiobooks, which were downloaded from the websites librivox.org and gutenberg.org, and segmented and synchronized using Python scripts and the Aeneas library.

This GitHub repository contains the code for the corpus and a small part of the database. In order to explore the full database of over 100 audiobooks, visit pca.clarin-pl.eu.

Users can search for specific words and phrases or speech sound combinations in the corpus, and narrow their search by author, reader, and text criteria. For more information on the purpose and materials used in the creation of the corpus, as well as background data and corpus statistics, see the full documentation available at the project's website.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
PCA		PCA
app_audio		app_audio
app_base		app_base
app_search		app_search
data		data
media		media
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
db.sqlite3		db.sqlite3
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCA

PCA

app_audio

app_audio

app_base

app_base

app_search

app_search

data

data

media

media

static

static

templates

templates

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

db.sqlite3

db.sqlite3

manage.py

manage.py

requirements.txt

requirements.txt

Repository files navigation

Phonetic Corpus of Audiobooks

About

Releases

Packages

Languages

License

Stolarski-Lukasz/PCA

Folders and files

Latest commit

History

Repository files navigation

Phonetic Corpus of Audiobooks

About

Resources

License

Stars

Watchers

Forks

Languages