Skip to content
This repository has been archived by the owner on Mar 8, 2023. It is now read-only.

Experiment on creating a new dataset audio+text #107

Open
Mte90 opened this issue Nov 30, 2020 · 3 comments
Open

Experiment on creating a new dataset audio+text #107

Mte90 opened this issue Nov 30, 2020 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Mte90
Copy link
Member

Mte90 commented Nov 30, 2020

On #90 we are talking about creating a new dataset but we need to experiment on how we can automatize it (also on reviewing).

The first experiment we can do is:

@Mte90 Mte90 added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Nov 30, 2020
@eziolotta
Copy link
Contributor

I'm starting same test of long audio segmentation, considering the speaker's voice activity.
On this fork: https://github.com/eziolotta/rVADfast

But i have same problem with quality of audio output...

@eziolotta
Copy link
Contributor

eziolotta commented Jan 31, 2021

First experiment of segmentation of short audio, using rVADfast and an algorithm that analyze segments found by rVAD to generate a new sequence of speech segments.
rVAD (and same other) tend to cut last bit signal of a speech segment.
Code and other tests yet to be published.

Input Clip : 644_2532_000000.wav - 15 second - (MLS Dataset)
Output : 5 Speech Segments (wav files)

test_segmentation_short_audio.zip

i try to extend algo to long audio (maybe hour, try Public Podcast )

@eziolotta
Copy link
Contributor

eziolotta commented Feb 13, 2021

Continuing the experiments with rVADFast, I was able to segment one random Podcast of Emilia Romagna Region

https://ambiente.regione.emilia-romagna.it/it/gallery/video/i-video-di-ermesambiente/convegno-inspire/stefano-olivucci-regione-emilia-romagna

Obtaining 143 segments with a duration from a minimum of 2 seconds to a maximum of 2 minutes.
Execution time for this process was approximately 1.5 hours

Audios are without transcription, so in this case an automatic transcription and human validation must be applied.

Unfortunately, other Speakers are also involved in podcasts, and some time words are not clear, check is required during validation. There is no background noise in Podcasts and the audio is clean.

Other Podcast here
Licence: Creative Commons Attribution 4.0

Output Dataset of My experiment can be downloaded here:
http://t.ly/xHHL

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants