In [19]:
!git push

Counting objects: 58, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (57/57), done.
Writing objects: 100% (58/58), 129.68 KiB | 10.81 MiB/s, done.
Total 58 (delta 5), reused 0 (delta 0)
remote: Resolving deltas: 100% (5/5), completed with 4 local objects.[K
remote: This repository moved. Please use the new location:[K
remote:   https://github.com/basilwong/sagemaker-repo.git[K
To https://github.com/basilwong/awstest1.git
   f6a41aa..5c02458  master -> master


### Add Dependencies

In [1]:
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
import boto3

from datetime import datetime
import zipfile
import os
import json 
import uuid
import requests

# Installing src dependency.
import sys
# insert at 1, 0 is the script path (or '' in REPL)
sys.path.append('src')

!pip install pydub
import audio_util
import processing_util

[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m




In [2]:
# Execution role
role = get_execution_role()
# S3 prefixes
common_prefix = "source_separation"
batch_inference_input_prefix = common_prefix + "/batch-inference-input-data"
# Sagemaker Session
sagemaker_session = sage.Session()
# Arn for Source Separator Model Package
modelpackage_arn = 'arn:aws:sagemaker:us-east-2:057799348421:model-package/source-separation-v11570291536-75ed8128ecee95e142ec4404d884ecad'



For the Corresponding IAM Role, add the following policies:

* AmazonTranscribeFullAccess
* AWSMarketplaceManageSubscriptions
* AmazonPollyFullAccess
* AmazonSageMakerFullAccess

### Creating the Model

In [30]:
def predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session, content_type='application/x-recordio-protobuf')

model = ModelPackage(role=role,
                     model_package_arn=modelpackage_arn,
                     sagemaker_session=sagemaker_session,
                     predictor_cls=predict_wrapper)

### Running the Batch Job

Note that if the initial audio file is longer than around 30 seconds, it is too large for the model. The split_mp3() method in  src.audio_util works around this by splitting an mp3 file into 30 second segments. 

This method requires ffmpeg as a dependency sice it uses pydub. Instead of installing it on the notebook, the code below was executed locally with ffmpeg installed. (```apt-get install ffmpeg``` for an Ubunutu machine, as I had trouble figuring out how to install it via yum).

But no worries the output of the split_mp3() method has already been added to this repository so no need to go execute it for demo purposes. 

In [74]:
# The following lines of code require 
audio_util.split_mp3("songs/drake-toosie_slide.mp3", "../source-separation-input/")



FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe': 'ffprobe'

In [31]:
batch_input_folder = "source-separation-input"

transform_input = sagemaker_session.upload_data(batch_input_folder, key_prefix=batch_inference_input_prefix)
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-075178354542/source_separation/batch-inference-input-data


In [32]:
bucket = sagemaker_session.default_bucket()

transformer = model.transformer(1, 'ml.m4.xlarge', strategy='SingleRecord', output_path='s3://'+bucket+'/'+common_prefix+'/batch-transform-output')
transformer.transform(transform_input, content_type='application/x-recordio-protobuf')
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

..................[34mStarting the inference server with 4 workers.[0m
[34m[2020-04-15 00:39:31 +0000] [11] [INFO] Starting gunicorn 19.9.0[0m
[34m[2020-04-15 00:39:31 +0000] [11] [INFO] Listening at: unix:/tmp/gunicorn.sock (11)[0m
[34m[2020-04-15 00:39:31 +0000] [11] [INFO] Using worker: gevent[0m
[34m[2020-04-15 00:39:31 +0000] [15] [INFO] Booting worker with pid: 15[0m
[34m[2020-04-15 00:39:31 +0000] [16] [INFO] Booting worker with pid: 16[0m
[34m[2020-04-15 00:39:31 +0000] [17] [INFO] Booting worker with pid: 17[0m
[34m[2020-04-15 00:39:31 +0000] [18] [INFO] Booting worker with pid: 18[0m
[34mTesting...[0m
[34m2020-04-15 00:40:07.431729: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA[0m
[34m169.254.255.130 - - [15/Apr/2020:00:40:07 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [15/Apr/2020:00:40:07 +0000] "GET /execution-

### Processing the Batch Output

In [33]:
# Downloading files from s3.
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(sagemaker_session.default_bucket())
prefix = "source_separation/batch-transform-output/"
i = 0
audio_util.clear_folder('source-separation-output/batch-transform-output')
for object_summary in my_bucket.objects.filter(Prefix=prefix):
    i = i + 1
    file_name = object_summary.key.split('/')[-1]
    print(file_name)
    my_bucket.download_file(prefix+ file_name, 'source-separation-output/batch-transform-output/output-{}.zip'.format(i))

input1.mp3.out
input2.mp3.out
input3.mp3.out
input4.mp3.out
input5.mp3.out
input6.mp3.out
input7.mp3.out


In [34]:
# Extracting files from zip files. 
audio_util.clear_folder('source-separation-output/extracted')
for file in os.listdir('source-separation-output/batch-transform-output'):
    print(file)
    with zipfile.ZipFile('source-separation-output/batch-transform-output/'+file, 'r') as zip_ref:
        zip_ref.extractall('source-separation-output/extracted/'+file.split('.')[0]+'/')

output-3.zip
output-6.zip
output-1.zip
output-4.zip
output-7.zip
output-2.zip
output-5.zip


In [35]:
# Separating the vocal files and the background sound files.
audio_util.clear_folder('source-separation-output/vocals')
audio_util.clear_folder('source-separation-output/background')
for i, folder in enumerate(sorted(os.listdir('source-separation-output/extracted/'))):
    for file in os.listdir('source-separation-output/extracted/' + folder + '/output'):
        new_file_name = str(i).zfill(5) + ".wav"
        if "vocals" in file:
            os.rename('source-separation-output/extracted/' + folder + '/output/' + file, 'source-separation-output/vocals/vocals' + new_file_name)
        elif "accompaniment" in file:
            os.rename('source-separation-output/extracted/' + folder + '/output/' + file, 'source-separation-output/background/background' + new_file_name)

### Transcribe the Vocal Files

In [36]:
# Upload the Vocal files onto s3
local_vocals_folder = "source-separation-output/vocals/"
transcribe_input_prefix = "transcribe-input"

transcribe_input = sagemaker_session.upload_data(local_vocals_folder, key_prefix=transcribe_input_prefix)
print("Transcribe input uploaded to " + transcribe_input)

Transcribe input uploaded to s3://sagemaker-us-east-2-075178354542/transcribe-input


In [60]:
# Start a transcription job for each file. Add the transcription to finsihed jobs once finished. 
transcribe = boto3.client('transcribe')
output_bucket_name = "transcribe-output"
audio_util.clear_folder('transcribe-output')
uri_prefix = "https://sagemaker-us-east-2-075178354542.s3.us-east-2.amazonaws.com/transcribe-input/"
finished_jobs = list()

for file in sorted(os.listdir(local_vocals_folder)):

    print("Transcribing: " + file)
    job_uri = uri_prefix + file
    transcribe.start_transcription_job(
        TranscriptionJobName=file,
        Media={'MediaFileUri': job_uri},
        MediaFormat='wav',
        LanguageCode='en-US'
    )
    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=file)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
    
    api_data = requests.get(url=status['TranscriptionJob']['Transcript']['TranscriptFileUri'])
    data = api_data.json()
    finished_jobs.append(data)
    dump_file_name = 'transcribe-output/transcription' + file.split(".")[0] + '.json'
    # Writing to json files for analysis purposes.
    with open(dump_file_name, 'w') as f:
        json.dump(data, f, indent=4)
    transcribe.delete_transcription_job(TranscriptionJobName=file)
    
finished_jobs.sort(key=lambda x : x['jobName'])

Transcribing: vocals00000.wav
Transcribing: vocals00001.wav
Transcribing: vocals00002.wav
Transcribing: vocals00003.wav
Transcribing: vocals00004.wav
Transcribing: vocals00005.wav
Transcribing: vocals00006.wav


In [59]:
transcribe.delete_transcription_job(TranscriptionJobName="vocals00000.wav")

{'ResponseMetadata': {'RequestId': 'd9ebc7de-4303-47be-bd62-554da8e183b6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Wed, 15 Apr 2020 01:09:40 GMT',
   'x-amzn-requestid': 'd9ebc7de-4303-47be-bd62-554da8e183b6',
   'content-length': '0',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Processing the Transcribe Output

In [4]:
# Adjustable Variable:

# Short words tend to be transcribed too short. So this manually extends them. 
extend_word_length_factor = 200 # (percent of total word duration)
word_under_x_ms_long = 500

# add_pause_for_gaps_greater_than = 100 # ms

In [13]:
# Patching the batches back together, generate transcription list from all the batches. 
transcribe_output_folder = "transcribe-output/"
offset = 0 # Takes into account that batches are sequential.
transcription_list = list()
index = 0
for file in sorted(os.listdir(transcribe_output_folder)):
    transcription_batch = json.load(open(transcribe_output_folder + file, "r", encoding="utf-8"))
    for map_item in transcription_batch["results"]["items"]:
        transcribe_object = processing_util.TranscriptionItem(map_item, index, offset)
        index += 1
        # Skip punctuation
        if transcribe_object.is_word():
            if transcribe_object.duration() < word_under_x_ms_long:
                transcribe_object.end_time += extend_word_length_factor
            transcription_list.append(transcribe_object)
        # Increase word duration if very short

    offset += 30000
    
# Add the 
transcribed_song_folder = "song-transcription/"
audio_util.clear_folder(transcribed_song_folder)
with open(transcribed_song_folder + "transcribed_song.json", 'w') as outfile:
    json.dump([item.to_dict() for item in transcription_list], outfile, indent=4)

### Giving Transcriptions to Amazon Polly

Amazon Polly is queried for each individual word to allow for easier control of timing and pitch.

In [8]:
def query_polly(polly_client, word, length, prefix, output_folder):
    
    ssml = """<speak><prosody amazon:max-duration="{max_len}ms">{word}</prosody></speak>""".format(max_len=str(length), word=word)          
    response = polly_client.start_speech_synthesis_task(VoiceId='Joey',
                OutputS3BucketName='sagemaker-us-east-2-075178354542',
                OutputS3KeyPrefix='polly-output/' + prefix,
                OutputFormat='mp3', 
                TextType = 'ssml',
                Text = ssml)


In [14]:
polly_client = boto3.client('polly')
polly_output_folder = "polly-output/"

for transcribe_object in transcription_list:
    
    response = query_polly(polly_client, transcribe_object.content, transcribe_object.duration(), transcribe_object.index, polly_output_folder)

    print("Polly Queried for: " + transcribe_object.content)

Polly Queried for: imagine
Polly Queried for: there's
Polly Queried for: no
Polly Queried for: heaven
Polly Queried for: It's
Polly Queried for: easy
Polly Queried for: if
Polly Queried for: you
Polly Queried for: try
Polly Queried for: No
Polly Queried for: Hell
Polly Queried for: no
Polly Queried for: above
Polly Queried for: us
Polly Queried for: Only
Polly Queried for: sky
Polly Queried for: Imagine
Polly Queried for: all
Polly Queried for: the
Polly Queried for: people
Polly Queried for: with
Polly Queried for: today
Polly Queried for: it
Polly Queried for: isn't
Polly Queried for: todo
Polly Queried for: nothing
Polly Queried for: to
Polly Queried for: kill
Polly Queried for: No
Polly Queried for: religion
Polly Queried for: Thio
Polly Queried for: Imagine
Polly Queried for: all
Polly Queried for: the
Polly Queried for: bbo
Polly Queried for: My
Polly Queried for: me
Polly Queried for: is
Polly Queried for: you
Polly Queried for: You
Polly Queried for: may
Polly Queried for: say


### Processing the Output from Amazon Polly

In [16]:
# Downloading files from s3.
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(sagemaker_session.default_bucket())
prefix = "polly-output/"
audio_util.clear_folder(prefix)


for object_summary in my_bucket.objects.filter(Prefix=prefix):
    file_name = object_summary.key.split('/')[-1]
    my_bucket.download_file(prefix+ file_name, prefix + file_name)
    
print("Files moved from s3 to repo.")

Files moved from s3 to repo.


Mixing Audio:

https://stackoverflow.com/questions/7629873/how-do-i-mix-audio-files-using-python

Pitch Modulation:

https://stackoverflow.com/questions/38923438/does-pydub-support-pitch-modulation

Create Video:

https://helpdeskgeek.com/windows-10/how-to-record-your-screen-on-windows-10/

Music Video:

https://www.oneimagevideo.com/