Dialect Classification

In many cases for speech classification, audio -> text -> text classification is a valid workflow, and avoids a lot of the awkwardness of working with audio. However, in some cases salient information is lost in the process of transcription. One class of these cases is when we want to investigate a certain manner of speaking.

In this project, I have trained a wav2vec2 model to classify Spanish speakers based on their accents by where they are from. The data we use contains 5 locales of Latin American Spanish, + Basque to further investigate how much easier we can expect language discrimination to be compared to dialect discrimination.

The Data

The high-quality audio clips of Spanish were introduced in a 2020 ACL Anthology paper available here.

For each language, the dataset contains a variety of speakers with different speaker profiles. This is favorable to some other datasets with 1 or very few speakers because our model is more likely to learn identifiers of dialects than individual voices / prosodies.

The data for Basque comes from a similar campaign described here, only with a focus on minority Western European languages rather than Latin American Spanish.

The Model

For this task, I used a wav2vec2 as a base before fine tuning. This model is introduced here.

A quick rundown of the model architecture (image from original paper).

The zoomed-out view of this model is we use a CNN on a normalized waveform to extract features. From the features we use a transformer network to learn a contextualized representation, but also use a discretized representation that helps our model identify distinct speech units. Both the discretized and contextualized representations are passed forward in the network.

Fine Tuning

The actual training process for this model was relatively light weight, only taking about half an hour (and starting to overfit within that time).

Results

Accuracy = 0.980

Analysis

Even splitting on speakers, our model achieves excellent accuracy on the testing set. This is interesting because it indicates that accent classification, at least at this granularity, is an easier task than voice identification, which could have just as easily met the training objective.

The confusion matrix shows that Basque is the most easily distinguished, which should be expecting as it is the only language that isn't Spanish. Puerto Rican was the hardest to identify in the testing set, but I think this is more having to do with PR having the least data moreso than something about the accent itself.

I think if this same size of dataset was used for this same experiment, but there were more speakers (and so not as much fitting on individual voices), we could expect near perfect accuracy.

References

Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech (Guevara-Rukoz et al., LREC 2020)

Open-Source High Quality Speech Datasets for Basque, Catalan and Galician (Kjartansson et al., SLTU 2020)

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baveski et al., Facebook AI 2020)

Huggingface Audio Classification Tutorial

This Model on Huggingface Hub

https://huggingface.co/hhsavich/accent_determinator

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
dialect_discrimination.ipynb		dialect_discrimination.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dialect_discrimination.ipynb

dialect_discrimination.ipynb

Repository files navigation

Dialect Classification

The Data

The Model

Fine Tuning

Results

Analysis

References

Further Reading

This Model on Huggingface Hub

About

Releases

Packages

Languages

HSavich/dialect_discrimination

Folders and files

Latest commit

History

README.md

README.md

dialect_discrimination.ipynb

dialect_discrimination.ipynb

Repository files navigation

Dialect Classification

The Data

The Model

Fine Tuning

Results

Analysis

References

Further Reading

This Model on Huggingface Hub

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages