### Overview
 
VideoDB's [Programmable Streams]((https://docs.videodb.io/version-0-0-3-timeline-and-assets-44)) are perfect for personalizing content to meet users' requirements. If users prefer not to include curse words in their content, VideoDB allows for these words to be either removed or replaced with a sound overlay such as beep sound. 

This task, typically complex for video editors, can be accomplished with just **a few lines of code** using VideoDB. x

This technique can also serve as a valuable **Content Moderation** component for any social content platform, ensuring that content meets the preferences and standards of its audience. 

Let's dive in!

In [10]:
#update to latest version
!pip install -U videodb



### Prerequisites
Ensure you have VideoDB installed in your environment. If not, simply run `!pip install videodb` in your terminal. 

You'll also need a `VideoDB API key`, which can be obtained from the VideoDB console.

In [3]:
# create a new connection with your API ket
from videodb import connect, play_stream
conn = connect(api_key="")

### Source Content

For this tutorial, let's take the Joe Rogan clip, where he is trying to trick siri into using curse words 🤣

In [2]:
# Joe rogan video clip
video = conn.upload(url='https://www.youtube.com/watch?v=7MV6tUCUd-c')

In [9]:
#watch the original video
o_stream = video.generate_stream()
play_stream(o_stream)

'https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/53db55a5-8fb1-44a0-b8c2-62cc1b4be532.m3u8'

In [3]:
# index spoken content in the video
video.index_spoken_words()

100%|███████████████████████████████████████████████████████████████████████████


### Create `beep` Asset

We have a sample beep sound in this folder, `beep.wav`. For those looking to add a more playful or unique touch, replacing the beep with alternative sound effects, such as a quack or any other sound, can make the content more engaging and fun.

In [5]:
# upload beep sound - This is just a sample, you can replace it with quack or any other sound effect.
beep = conn.upload(file_path="beep.wav")

In [6]:
# Import Asset and Timeline
from videodb.asset import VideoAsset, AudioAsset
from videodb.timeline import Timeline

In [7]:
#create audio asset from beep sound, 
AudioAsset(asset_id=beep.id)

AudioAsset(asset_id=a-2bb25fdb-e351-4fde-918c-15f260e6a740, start=0, end=None, disable_other_tracks=True, fade_in_duration=0, fade_out_duration=0)

### Moderation
To ensure appropriate content management, it's necessary to have a method for identifying profanity and applying a predefined overlay to censor it. In this tutorial, we've included a list of curse words. Feel free to customize this list according to your requirements.


In [9]:
curse_words_list = ['shit', 'ass', 'shity' 'fuck', 'motherfucker','damn', 'fucking', 'motherfuker']

In [8]:
# let's review transcript 
video.get_transcript_text()

"Give a lot of home like Siri and Alexis at your house out of that fat ass whole thing, or is that hole that you can make some serious a motherfuker seriously motherfuker. Yeah people think I was joking about that. Let me show you how to do that. Watch this box. Here we go. What is the definition? What's the definition of mother? Come on, you piece of shit. Play 2 Chainz. I'm sorry about that. What's the definition of mother? She has given birth. Do you want to hear the next one? Yes. it means child Yes. What's a verb that means give birth to? Oh my God, Siri, bitch, you took it from me and moved it and then take it out. Do they take out the dead body one. Remember the first one that's a crazy one short for motherfuker. It would say as a no. Think I tweeted what it said what exactly it said. Yeah, I did it. I did it today you did and it took me a couple tries cuz I didn't say it, right. But I ended up doing it. Did it says as a noun short for motherfuker, but it could be a verb as well

### Find Curse Words
We'll use few NLP techniques to identify all variations of any offensive words, eliminating the need to manually find and include each form. Additionally, by analyzing the transcript, you can gain insights into how these sounds are transcribed, acknowledging the possibility of errors. 

In [None]:
#install spacy
!pip install spacy

In [None]:
#install dataset english core
!python -m spacy download en_core_web_sm

In [10]:
# load the english corpus 
import spacy
import re
nlp = spacy.load("en_core_web_sm")

In [1]:
def get_root_word(word):
    """
    This function convert each word into its root word
    """
    try:
        #clean punctuations 
        cleaned_word = re.sub(r'[^\w\s]', '', word)
        
        # Process the sentence
        doc = nlp(cleaned_word)
        
        # Lemmatize the word
        lemmatized_word = [token.lemma_ for token in doc][0]  # Assuming single word input
    
        return lemmatized_word
    except Exception as e:
        print(f"some issue with lemma for the word {word}")
        return word

### Review Transcript

Let's review transcript once again, and double check if we want to beep any other word

In [12]:
transcript = video.get_transcript()
transcript

In [14]:
# add more words that you want to beep
curse_words_list.append('hole')
curse_words_list.append('bitch')
curse_words_list.append('mother')

### Create Fresh Timeline

Let's create a fresh timeline object and add the VideoAsset created from the original video inline. Loop through each word, wherever you match the word, add audio overlay created from the beep sound for that timestamp. It's that simple!



In [15]:
# create a new Timeline object
timeline = Timeline(conn)

# add the main video inline
video_asset = VideoAsset(asset_id=video.id)
timeline.add_inline(video_asset)

for word in transcript:
    text = word.get('text')
    if text not in ['-']:
        root_word = get_root_word(text)
        if root_word in curse_words_list:
            beep_start_time = float(word.get('start'))
            beep_end_time = float(word.get('end'))
            beep_duration = beep_end_time - beep_start_time
            
            #add asset overlay of beep duration
            print(f"beep the word: {text}, {beep_start_time}:{beep_end_time} ")
            timeline.add_overlay(start=beep_start_time, asset=AudioAsset(asset_id=beep.id,start=0, end=beep_duration))
            
stream_url = timeline.generate_stream()

beep the word: ass, 3.9:4.1 
beep the word: hole, 5.0:5.5 
beep the word: motherfuker, 7.7:9.5 
beep the word: motherfuker., 11.2:11.9 
beep the word: mother?, 34.5:34.9 
beep the word: shit., 36.6:36.7 
beep the word: mother?, 44.6:45.0 
beep the word: bitch,, 66.6:67.1 
beep the word: motherfuker., 82.8:83.6 
beep the word: motherfuker,, 105.4:106.1 
beep the word: mother, 111.7:111.9 
beep the word: ass, 112.2:113.1 
beep the word: mother, 113.5:113.8 
beep the word: motherfuker., 119.9:120.4 
beep the word: mother, 138.9:139.2 
beep the word: mother?, 149.5:149.8 
beep the word: mother, 195.5:196.1 
beep the word: motherfuker,, 204.3:205.1 
beep the word: motherfuker, 225.0:225.7 


### Review and Share Your Moderated Video
Finally, watch and share your new stream:

In [16]:
from videodb import play_stream
play_stream(stream_url)

'https://console.videodb.io/player?url=https://stream.videodb.io/v3/published/manifests/a49074ce-e024-4d0e-96ef-8a2ab67c5fd3.m3u8'

### The Real Power of Programmable Streams 
If you have videos pre-uploaded and indexed, running this beep pipeline is in real-time. So, based on your users' choices or your platform's policy, you can use information from spoken content to automatically moderate.