PyAudioCensor

Copyright (c) 2023 Mayukhmali Das

Pypi package has been already deployed. Install using

pip install PyAudioCensor==0.0.1

Note install vosk using pip install vosk before running the module. Also download a specific vosk model and put it in a model folder. Specify that model path in model_path input of the main function of the PyAudioCensor module.

Firstly we have to give the audio file to be censored. Then we perform offline speech recognition on the audio using Vosk api. Thus we will get a list of words with their respective timestamps. Now the user will input which words to censor. The respective timestamps of those words are found out and then the censor audio provided by the user is overlaid at those instances.

Audio files can be automatically censored using the censor_audio function of my PyAudioCensor module. Process will be completely offline.

def censor_audio(base_audio_path, censor_audio_path, output_audio_path, model_path, to_censor, gain_of_censor=0, gain_of_base=-40,  silent=1):

Description of function parameters:

Base_Audio_Path:

The path to the base audio to censor.

Censor_Audio_Path:

The path to the audio which will be used to censor.

Output_Audio_Path:

The location where the censored audio will be stored.

Model_path:

Path for the Vosk Model https://alphacephei.com/vosk/models

Download a model from the Vosk website and save in the model folder. The file is too big to upload in Github. You can also find a google drive link at the bottom of the readme which has the complete package along with the model preinstalled.

Vosk API is a speech recognition toolkit developed by Alpha Cephei Inc., which is based on the Kaldi toolkit. Kaldi is a free and open-source toolkit for speech recognition developed by the Johns Hopkins University Speech and Language Processing Group. It uses modern machine learning techniques, including deep neural networks, to achieve high accuracy and efficiency.

The offline speech recognition with timestamps part is done with the help of Dmytro Nikolaiev work referenced below: https://gitlab.com/Winston-90/foreign_speech_recognition/-/tree/main/timestamps

To_censor:

List of words to be censored

Gain_of_Base :

Gain of the base audio.

Gain_of_Overlay:

Gain of the censor audio.

Silent :

Make base audio silent while overlaying the censor audio.

Example of implementation :

I will be demonstrating the working of this package using the song "Happier" by Marshmello

from PyAudioCensor import main


main.censor_audio("base_audio.wav","overlay_audio.wav","censored.wav",model_path="PyCensorAudio\model", to_censor=["happier","morning","story","mind"],silent=1)

Here is the speech recognition of the song along with timestamps given by Vosk

they                 from 0.33 sec to 0.62 sec, confidence is 63.98%  

they                 from 0.62 sec to 1.23 sec, confidence is 100.00%  

are                  from 1.26 sec to 1.59 sec, confidence is 100.00%

the                  from 1.59 sec to 2.13 sec, confidence is 100.00%

oven                 from 2.16 sec to 2.76 sec, confidence is 30.85%

thing                from 2.76 sec to 3.60 sec, confidence is 100.00%

i                    from 3.63 sec to 3.96 sec, confidence is 100.00%

want                 from 3.96 sec to 4.32 sec, confidence is 100.00%

you                  from 4.32 sec to 4.53 sec, confidence is 100.00%

to                   from 4.53 sec to 4.83 sec, confidence is 100.00%

be                   from 4.86 sec to 5.13 sec, confidence is 100.00%

happier              from 5.13 sec to 5.97 sec, confidence is 84.00%

i                    from 6.03 sec to 6.33 sec, confidence is 100.00%

want                 from 6.33 sec to 6.72 sec, confidence is 100.00%

you                  from 6.72 sec to 6.90 sec, confidence is 100.00%

to                   from 6.90 sec to 7.20 sec, confidence is 100.00%

be                   from 7.20 sec to 7.53 sec, confidence is 100.00%

happier              from 7.53 sec to 8.94 sec, confidence is 100.00%

when                 from 8.97 sec to 9.21 sec, confidence is 100.00%

the                  from 9.21 sec to 9.30 sec, confidence is 100.00%

morning              from 9.30 sec to 9.90 sec, confidence is 100.00%

comes                from 9.90 sec to 10.83 sec, confidence is 100.00%

we                   from 10.93 sec to 11.10 sec, confidence is 92.56%

see                  from 11.10 sec to 11.40 sec, confidence is 100.00%

what                 from 11.40 sec to 11.67 sec, confidence is 98.79%

we've                from 11.73 sec to 12.06 sec, confidence is 89.67%

become               from 12.06 sec to 12.90 sec, confidence is 83.34%

in                   from 13.14 sec to 13.47 sec, confidence is 100.00%

colada               from 13.50 sec to 14.10 sec, confidence is 100.00%

day                  from 14.13 sec to 14.37 sec, confidence is 93.45%

where                from 14.37 sec to 14.63 sec, confidence is 68.65%

flaming              from 14.67 sec to 15.33 sec, confidence is 97.49%

when                 from 15.33 sec to 15.57 sec, confidence is 90.09%

notified             from 15.57 sec to 16.44 sec, confidence is 100.00%

we                   from 16.47 sec to 16.83 sec, confidence is 90.59%

be                   from 16.83 sec to 17.49 sec, confidence is 100.00%

gone                 from 17.49 sec to 18.33 sec, confidence is 100.00%

every                from 18.60 sec to 18.96 sec, confidence is 100.00%

argument             from 18.96 sec to 20.10 sec, confidence is 100.00%

every                from 20.43 sec to 20.76 sec, confidence is 100.00%

word                 from 20.76 sec to 21.09 sec, confidence is 100.00%

we                   from 21.09 sec to 21.27 sec, confidence is 100.00%

translate            from 21.30 sec to 21.93 sec, confidence is 100.00%

back                 from 21.93 sec to 22.71 sec, confidence is 100.00%

just                 from 22.77 sec to 23.01 sec, confidence is 77.98%

read                 from 23.01 sec to 23.22 sec, confidence is 39.86%

all                  from 23.22 sec to 23.37 sec, confidence is 77.98%

that                 from 23.37 sec to 23.58 sec, confidence is 100.00%

has                  from 23.58 sec to 23.76 sec, confidence is 100.00%

happened             from 23.76 sec to 24.12 sec, confidence is 100.00%

nothing              from 24.12 sec to 24.57 sec, confidence is 100.00%

now                  from 24.57 sec to 24.72 sec, confidence is 47.76%

we                   from 24.72 sec to 24.90 sec, confidence is 57.61%

prefer               from 24.90 sec to 25.35 sec, confidence is 57.61%

to                   from 25.35 sec to 25.50 sec, confidence is 56.25%

play                 from 25.50 sec to 25.74 sec, confidence is 56.25%

the                  from 25.74 sec to 25.98 sec, confidence is 45.44%

story                from 25.98 sec to 27.39 sec, confidence is 100.00%

i                    from 32.43 sec to 32.67 sec, confidence is 41.46%

wanted               from 32.68 sec to 33.16 sec, confidence is 29.49%

to                   from 33.16 sec to 33.24 sec, confidence is 43.02%

change               from 33.25 sec to 33.66 sec, confidence is 76.04%

my                   from 33.66 sec to 33.94 sec, confidence is 81.84%

mind                 from 33.94 sec to 34.50 sec, confidence is 73.78%

right                from 36.33 sec to 36.90 sec, confidence is 100.00%

the                  from 47.16 sec to 47.61 sec, confidence is 100.00%

hey                  from 48.72 sec to 57.06 sec, confidence is 26.46%

Our interest is in the words and their respective starting times. So we format the above output as below:

['they', 0.33]
['they', 0.621496]
['are', 1.26]
['the', 1.59]
['oven', 2.16]
['thing', 2.76]
['i', 3.63]
['want', 3.96]
['you', 4.32]
['to', 4.53]
['be', 4.86]
['happier', 5.13]
['i', 6.03]
['want', 6.33]
['you', 6.72]
['to', 6.9]
['be', 7.2]
['happier', 7.53]
['when', 8.97]
['the', 9.21]
['morning', 9.3]
['comes', 9.9]
['we', 10.933383]
['see', 11.1]
['what', 11.4]
["we've", 11.73]
['become', 12.06]
['in', 13.14]
['colada', 13.5]
['day', 14.13]
['where', 14.37]
['flaming', 14.67]
['when', 15.33]
['notified', 15.569462]
['we', 16.47]
['gone', 17.49]
['every', 18.6]
['argument', 18.96]
['every', 20.43]
['word', 20.76]
['we', 21.09]
['translate', 21.3]
['back', 21.93]
['just', 22.77]
['read', 23.01]
['all', 23.22]
['that', 23.37]
['has', 23.58]
['happened', 23.76]
['nothing', 24.12]
['now', 24.57]
['we', 24.72]
['prefer', 24.9]
['to', 25.35]
['play', 25.5]
['the', 25.74]
['story', 25.977642]
['i', 32.43]
['wanted', 32.67538]
['to', 33.157621]
['change', 33.24668]
['my', 33.66]
['mind', 33.944718]
['right', 36.33]
['the', 47.16]
['hey', 48.72]

Now the user will have to give a list of words to be censored. The below code will find all the occurrences of those words and will create a list of starting times to be censored.

time_list=timestamp_list(base_audio_path,model_path)


    def find_time_occurrences(to_censor):
        result = []
        for word in to_censor:
            for item in time_list:
                if item[0] == word:
                    result.append(item[1]*1000)
        return result


    censor_time=find_time_occurrences(to_censor)


    positions=censor_time


    for value in censor_time:
       print(value)

Here in this case we are censoring the words

["happier","morning","story","mind"].

Below is the list of start times for all these words generated by the above function.

Now the censor audio will be overlayed at these particular timestamps(in ms).

Here are links to all three audio files and the full package:

Base Audio

https://drive.google.com/file/d/1A_V6ycI4DMyGEyzNZmv60ZxUZObwIUiL/view?usp=sharing

Overlay Audio

https://drive.google.com/file/d/1blOtMAF08ZCaXR0dIJiqWXnVMl_yipFm/view?usp=share_link

Censored Audio

https://drive.google.com/file/d/1JK9dBF96_Vqx6-LHC6fdkbeyUCqhJnv9/view?usp=sharing

PyAudioCensor Package:

https://drive.google.com/drive/folders/1JXyV5dfNGy_3aUsMrsdOoEzWWJ9vOqTn

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
PyAudioCensor		PyAudioCensor
LICENSE		LICENSE
PyAudioCensor_Demo_Mayukhmali_Das.ipynb		PyAudioCensor_Demo_Mayukhmali_Das.ipynb
PyAudioCensor_Mayukhmali_Das.pdf		PyAudioCensor_Mayukhmali_Das.pdf
README.md		README.md
base_audio.wav		base_audio.wav
overlay_audio.wav		overlay_audio.wav
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyAudioCensor

Copyright (c) 2023 Mayukhmali Das

Note install vosk using pip install vosk before running the module. Also download a specific vosk model and put it in a model folder. Specify that model path in model_path input of the main function of the PyAudioCensor module.

Description of function parameters:

Base_Audio_Path:

Censor_Audio_Path:

Output_Audio_Path:

Model_path:

Download a model from the Vosk website and save in the model folder. The file is too big to upload in Github. You can also find a google drive link at the bottom of the readme which has the complete package along with the model preinstalled.

To_censor:

Gain_of_Base :

Gain_of_Overlay:

Silent :

Example of implementation :

Copyright (c) 2023 Mayukhmali Das

About

Releases

Packages

Languages

License

Smartmayukh/PyAudioCensor

Folders and files

Latest commit

History

Repository files navigation

PyAudioCensor

Copyright (c) 2023 Mayukhmali Das

Note install vosk using pip install vosk before running the module. Also download a specific vosk model and put it in a model folder. Specify that model path in model_path input of the main function of the PyAudioCensor module.

Description of function parameters:

Base_Audio_Path:

Censor_Audio_Path:

Output_Audio_Path:

Model_path:

Download a model from the Vosk website and save in the model folder. The file is too big to upload in Github. You can also find a google drive link at the bottom of the readme which has the complete package along with the model preinstalled.

To_censor:

Gain_of_Base :

Gain_of_Overlay:

Silent :

Example of implementation :

Copyright (c) 2023 Mayukhmali Das

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages