In [61]:
!git push

Counting objects: 228, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (225/225), done.
Writing objects: 100% (228/228), 52.55 MiB | 4.54 MiB/s, done.
Total 228 (delta 80), reused 0 (delta 0)
remote: Resolving deltas: 100% (80/80), completed with 2 local objects.[K
remote: This repository moved. Please use the new location:[K
remote:   https://github.com/basilwong/sagemaker-repo.git[K
To https://github.com/basilwong/awstest1.git
   d1d81de..d49dcea  master -> master


### Add Dependencies

In [2]:
import sagemaker as sage
from sagemaker import get_execution_role
from sagemaker import ModelPackage
import boto3

from datetime import datetime
import zipfile
import os
import json 
import uuid

# Installing src dependency.
import sys
# insert at 1, 0 is the script path (or '' in REPL)
sys.path.append('src')

!pip install pydub
import audio_util
import processing_util

Collecting pydub
  Downloading https://files.pythonhosted.org/packages/79/db/eaf620b73a1eec3c8c6f8f5b0b236a50f9da88ad57802154b7ba7664d0b8/pydub-0.23.1-py2.py3-none-any.whl
Installing collected packages: pydub
Successfully installed pydub-0.23.1
[33mYou are using pip version 10.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m




In [3]:
# Execution role
role = get_execution_role()
# S3 prefixes
common_prefix = "source_separation"
batch_inference_input_prefix = common_prefix + "/batch-inference-input-data"
# Sagemaker Session
sagemaker_session = sage.Session()
# Arn for Source Separator Model Package
modelpackage_arn = 'arn:aws:sagemaker:us-east-2:057799348421:model-package/source-separation-v11570291536-75ed8128ecee95e142ec4404d884ecad'



For the Corresponding IAM Role, add the following policies:

* AmazonTranscribeFullAccess
* AWSMarketplaceManageSubscriptions
* AmazonPollyFullAccess
* AmazonSageMakerFullAccess

### Creating the Model

In [3]:
def predict_wrapper(endpoint, session):
    return sage.RealTimePredictor(endpoint, session, content_type='application/x-recordio-protobuf')

model = ModelPackage(role=role,
                     model_package_arn=modelpackage_arn,
                     sagemaker_session=sagemaker_session,
                     predictor_cls=predict_wrapper)

### Running the Batch Job

Note that if the initial audio file is longer than around 30 seconds, it is too large for the model. The split_mp3() method in  src.audio_util works around this by splitting an mp3 file into 30 second segments. 

This method requires ffmpeg as a dependency sice it uses pydub. Instead of installing it on the notebook, the code below was executed locally with ffmpeg installed. (```apt-get install ffmpeg``` for an Ubunutu machine, as I had trouble figuring out how to install it via yum).

But no worries the output of the split_mp3() method has already been added to this repository so no need to go execute it for demo purposes. 

In [74]:
# The following lines of code require 
audio_util.split_mp3("songs/drake-toosie_slide.mp3", "../source-separation-input/")



FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe': 'ffprobe'

In [4]:
batch_input_folder = "source-separation-input"


transform_input = sagemaker_session.upload_data(batch_input_folder, key_prefix=batch_inference_input_prefix)
print("Transform input uploaded to " + transform_input)

Transform input uploaded to s3://sagemaker-us-east-2-075178354542/source_separation/batch-inference-input-data


In [5]:
bucket = sagemaker_session.default_bucket()

transformer = model.transformer(1, 'ml.m4.xlarge', strategy='SingleRecord', output_path='s3://'+bucket+'/'+common_prefix+'/batch-transform-output')
transformer.transform(transform_input, content_type='application/x-recordio-protobuf')
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

....................[34mStarting the inference server with 4 workers.[0m
[34m[2020-04-13 04:48:15 +0000] [10] [INFO] Starting gunicorn 19.9.0[0m
[34m[2020-04-13 04:48:15 +0000] [10] [INFO] Listening at: unix:/tmp/gunicorn.sock (10)[0m
[34m[2020-04-13 04:48:15 +0000] [10] [INFO] Using worker: gevent[0m
[34m[2020-04-13 04:48:15 +0000] [14] [INFO] Booting worker with pid: 14[0m
[34m[2020-04-13 04:48:15 +0000] [15] [INFO] Booting worker with pid: 15[0m
[34m[2020-04-13 04:48:15 +0000] [16] [INFO] Booting worker with pid: 16[0m
[34m[2020-04-13 04:48:15 +0000] [17] [INFO] Booting worker with pid: 17[0m
[34mTesting...[0m
[35mTesting...[0m
[34m2020-04-13 04:48:39.079016: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA[0m
[34m169.254.255.130 - - [13/Apr/2020:04:48:39 +0000] "GET /ping HTTP/1.1" 200 1 "-" "Go-http-client/1.1"[0m
[34m169.254.255.130 - - [13/Apr/2020:04:48:39 +

[34m['audio_file_1586753595.1332912.mp3_vocals.wav', 'audio_file_1586753595.1332912.mp3_accompaniment.wav'][0m
[35m['audio_file_1586753595.1332912.mp3_vocals.wav', 'audio_file_1586753595.1332912.mp3_accompaniment.wav'][0m
[34m169.254.255.130 - - [13/Apr/2020:04:54:01 +0000] "POST /invocations HTTP/1.1" 200 19459102 "-" "Go-http-client/1.1"[0m
[34mInput path : /tmp/audio_file_1586753641.1667993.mp3[0m
[34mProducing source estimates for input mixture file /tmp/audio_file_1586753641.1667993.mp3[0m
[34mTesting...[0m
[35m169.254.255.130 - - [13/Apr/2020:04:54:01 +0000] "POST /invocations HTTP/1.1" 200 19459102 "-" "Go-http-client/1.1"[0m
[35mInput path : /tmp/audio_file_1586753641.1667993.mp3[0m
[35mProducing source estimates for input mixture file /tmp/audio_file_1586753641.1667993.mp3[0m
[35mTesting...[0m
[34mNum of variables64[0m
[34mPre-trained model restored for song prediction[0m
[35mNum of variables64[0m
[35mPre-trained model restored for song prediction[0

### Processing the Batch Output

In [142]:
# Downloading files from s3.
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(sagemaker_session.default_bucket())
prefix = "source_separation/batch-transform-output/"
i = 0
audio_util.clear_folder('source-separation-output/batch-transform-output')
for object_summary in my_bucket.objects.filter(Prefix=prefix):
    i = i + 1
    file_name = object_summary.key.split('/')[-1]
    print(file_name)
    my_bucket.download_file(prefix+ file_name, 'source-separation-output/batch-transform-output/output-{}.zip'.format(i))

input1.mp3.out
input2.mp3.out
input3.mp3.out
input4.mp3.out
input5.mp3.out
input6.mp3.out
input7.mp3.out
input8.mp3.out
input9.mp3.out


In [148]:
# Extracting files from zip files. 
audio_util.clear_folder('source-separation-output/extracted')
for file in os.listdir('source-separation-output/batch-transform-output'):
    print(file)
    with zipfile.ZipFile('source-separation-output/batch-transform-output/'+file, 'r') as zip_ref:
        zip_ref.extractall('source-separation-output/extracted/'+file.split('.')[0]+'/')

output-3.zip
output-6.zip
output-8.zip
output-9.zip
output-1.zip
output-4.zip
output-7.zip
output-2.zip
output-5.zip


In [1]:
# Separating the vocal files and the background sound files.
audio_util.clear_folder('source-separation-output/vocals')
audio_util.clear_folder('source-separation-output/background')
for i, folder in enumerate(sorted(os.listdir('source-separation-output/extracted/'))):
    for file in os.listdir('source-separation-output/extracted/' + folder + '/output'):
        new_file_name = str(i).zfill(5) + ".wav"
        if "vocals" in file:
            os.rename('source-separation-output/extracted/' + folder + '/output/' + file, 'source-separation-output/vocals/vocals' + new_file_name)
        elif "accompaniment" in file:
            os.rename('source-separation-output/extracted/' + folder + '/output/' + file, 'source-separation-output/background/background' + new_file_name)

NameError: name 'audio_util' is not defined

### Transcribe the Vocal Files

In [151]:
# Upload the Vocal files onto s3
local_vocals_folder = "source-separation-output/vocals/"
transcribe_input_prefix = "transcribe-input"

transcribe_input = sagemaker_session.upload_data(local_vocals_folder, key_prefix=transcribe_input_prefix)
print("Transcribe input uploaded to " + transcribe_input)

Transcribe input uploaded to s3://sagemaker-us-east-2-075178354542/transcribe-input


In [None]:
# Start a transcription job for each file. Add the transcription to finsihed jobs once finished. 
transcribe = boto3.client('transcribe')
output_bucket_name = "transcribe-output"
audio_util.clear_folder('transcribe-output')
uri_prefix = "https://sagemaker-us-east-2-075178354542.s3.us-east-2.amazonaws.com/transcribe-input/"
finished_jobs = list()

for file in sorted(os.listdir(local_vocals_folder)):

    print("Transcribing: " + file)
    job_uri = uri_prefix + file
    transcribe.start_transcription_job(
        TranscriptionJobName=file,
        Media={'MediaFileUri': job_uri},
        MediaFormat='wav',
        LanguageCode='en-US'
    )
    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=file)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
    
    api_data = requests.get(url=status['TranscriptionJob']['Transcript']['TranscriptFileUri'])
    data = api_data.json()
    finished_jobs.append(data)
    dump_file_name = 'transcribe-output/transcription' + file.split(".")[0] + '.json'
    # Writing to json files for analysis purposes.
    with open(dump_file_name, 'w') as f:
        json.dump(data, f, indent=4)
    transcribe.delete_transcription_job(TranscriptionJobName=file)
    
finished_jobs.sort(key=lambda x : x['jobName'])

Transcribing: vocals00000.wav
Transcribing: vocals00001.wav


In [158]:
transcribe.delete_transcription_job(TranscriptionJobName="vocals00000.wav")

{'ResponseMetadata': {'RequestId': '20153835-a1cb-4d8b-b31b-fc5194cf9a11',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Mon, 13 Apr 2020 23:21:58 GMT',
   'x-amzn-requestid': '20153835-a1cb-4d8b-b31b-fc5194cf9a11',
   'content-length': '0',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

### Giving Transcriptions to Amazon Polly

Amazon Polly is queried for each individual word to allow for easier control of timing and pitch.

In [5]:
# Adjustable Variable:

# Short words tend to be transcribed too short. So this manually extends them. 
extend_word_length_factor = 0.2 # (percent of total word duration)

In [6]:
def query_polly(polly_client, word, length, prefix, output_folder):
    
    ssml = """<speak><prosody amazon:max-duration="{max_len}ms">{word}</prosody></speak>""".format(max_len=str(length), word=word)          
    response = polly_client.start_speech_synthesis_task(VoiceId='Joey',
                OutputS3BucketName='sagemaker-us-east-2-075178354542',
                OutputS3KeyPrefix='polly-output/' + prefix,
                OutputFormat='mp3', 
                TextType = 'ssml',
                Text = ssml)


In [7]:
# Patching the batches back together, generate transcription list from all the batches. 
transcribe_output_folder = "transcribe-output/"
offset = 0 # Takes into account that batches are sequential.
transcription_list = list()
for file in sorted(os.listdir(transcribe_output_folder)):
    transcription_batch = json.load(open(transcribe_output_folder + file, "r", encoding="utf-8"))
    for map_item in transcription_batch["results"]["items"]:
        transcribe_object = processing_util.TranscriptionItem(map_item, offset)
        # Skip punctuation
        if transcribe_object.is_word():
            transcription_list.append(transcribe_object)
    offset += 30000

In [None]:
polly_client = boto3.client('polly')

transcribe_output_folder = "transcribe-output/"
polly_output_folder = "polly-output/"
audio_util.clear_folder(polly_output_folder)
silence_dict = {"length" : 0}
index = 0
expected_start_time = 0

for i, transcribe_object in enumerate(transcription_list):
    
    # Create pause if there is a larger than 1 second gap between words.
    if expected_start_time + 500 <= transcribe_object.start_time:
        silence_dict["length"] = transcribe_object.start_time - expected_start_time
        with open(polly_output_folder + str(index).zfill(5) + ".json", 'w') as outfile:
            json.dump(silence_dict, outfile)
        expected_start_time = transcribe_object.start_time
        index += 1
    
    if transcribe_object.duration() < 500:
        transcribe_object.end_time += transcribe_object.duration() * extend_word_length_factor
        
    response = query_polly(polly_client, transcribe_object.content, transcribe_object.duration(), str(index).zfill(5), polly_output_folder)
    expected_start_time = transcribe_object.end_time
    print("Polly Queried " + str(index) + " for: " + transcribe_object.content, "| Word Duration(ms): " + str(transcribe_object.duration()), "| Transcription Confidence: " + str(transcribe_object.confidence))  
    print(expected_start_time)
    index += 1

Polly Queried 0 for: Oh | Word Duration(ms): 516.0 | Transcription Confidence: 0.871
556.0
Polly Queried 2 for: black | Word Duration(ms): 552.0 | Transcription Confidence: 1.0
10662.0
Polly Queried 3 for: leather | Word Duration(ms): 444.0 | Transcription Confidence: 1.0
11024.0
Polly Queried 4 for: blood | Word Duration(ms): 480.0 | Transcription Confidence: 0.9456
11440.0
Polly Queried 5 for: No | Word Duration(ms): 300.0 | Transcription Confidence: 0.8913
11660.0
Polly Queried 6 for: seafood | Word Duration(ms): 780.0 | Transcription Confidence: 0.9966
12390.0
Polly Queried 8 for: buckles | Word Duration(ms): 560.0 | Transcription Confidence: 0.9944
13650.0
Polly Queried 9 for: on | Word Duration(ms): 204.0 | Transcription Confidence: 0.9998
13854.0
Polly Queried 10 for: a | Word Duration(ms): 132.0 | Transcription Confidence: 0.9204
13952.0
Polly Queried 11 for: jacket | Word Duration(ms): 528.0 | Transcription Confidence: 0.999
14458.0
Polly Queried 12 for: It's | Word Duration(m

Polly Queried 101 for: dance | Word Duration(ms): 576.0 | Transcription Confidence: 1.0
47576.0
Polly Queried 102 for: with | Word Duration(ms): 180.0 | Transcription Confidence: 1.0
47660.0
Polly Queried 103 for: me | Word Duration(ms): 540.0 | Transcription Confidence: 1.0
48170.0
Polly Queried 104 for: No | Word Duration(ms): 560.0 | Transcription Confidence: 0.6201
48730.0
Polly Queried 105 for: I | Word Duration(ms): 180.0 | Transcription Confidence: 0.9999
49120.0
Polly Queried 106 for: could | Word Duration(ms): 264.0 | Transcription Confidence: 0.9827
49354.0
Polly Queried 107 for: guess | Word Duration(ms): 276.0 | Transcription Confidence: 0.7737
49586.0
Polly Queried 108 for: I'm | Word Duration(ms): 168.0 | Transcription Confidence: 0.7271
49708.0
Polly Queried 109 for: Michael | Word Duration(ms): 336.0 | Transcription Confidence: 0.9955
50016.0
Polly Queried 110 for: J | Word Duration(ms): 610.0 | Transcription Confidence: 0.9907
50570.0
Polly Queried 111 for: Son | Word 

Polly Queried 195 for: That's | Word Duration(ms): 204.0 | Transcription Confidence: 0.3464
86234.0
Polly Queried 196 for: so | Word Duration(ms): 144.0 | Transcription Confidence: 0.5213
86344.0
Polly Queried 197 for: mean | Word Duration(ms): 516.0 | Transcription Confidence: 0.9885
86836.0
Polly Queried 198 for: being | Word Duration(ms): 336.0 | Transcription Confidence: 0.4644
87356.0
Polly Queried 199 for: mistaken | Word Duration(ms): 570.0 | Transcription Confidence: 0.39
87870.0
Polly Queried 200 for: for | Word Duration(ms): 168.0 | Transcription Confidence: 0.5765
88218.0
Polly Queried 201 for: other | Word Duration(ms): 444.0 | Transcription Confidence: 0.9745
88644.0
Polly Queried 203 for: people | Word Duration(ms): 504.0 | Transcription Confidence: 1.0
90074.0
Polly Queried 204 for: would | Word Duration(ms): 180.0 | Transcription Confidence: 0.4719
90180.0
Polly Queried 205 for: at | Word Duration(ms): 192.0 | Transcription Confidence: 0.5898
90342.0
Polly Queried 206 f

Polly Queried 285 for: it | Word Duration(ms): 108.0 | Transcription Confidence: 0.53
116858.0
Polly Queried 286 for: Get | Word Duration(ms): 372.0 | Transcription Confidence: 0.9486
117212.0
Polly Queried 287 for: it | Word Duration(ms): 336.0 | Transcription Confidence: 0.9471
117486.0
Polly Queried 288 for: Go | Word Duration(ms): 252.0 | Transcription Confidence: 0.9764
117682.0
Polly Queried 289 for: right | Word Duration(ms): 336.0 | Transcription Confidence: 0.9978
117976.0
Polly Queried 290 for: foot | Word Duration(ms): 492.0 | Transcription Confidence: 0.6577
118422.0
Polly Queried 292 for: that | Word Duration(ms): 432.0 | Transcription Confidence: 0.9907
120952.0
Polly Queried 293 for: foot | Word Duration(ms): 504.0 | Transcription Confidence: 0.9761
121394.0
Polly Queried 294 for: right | Word Duration(ms): 520.0 | Transcription Confidence: 1.0
122400.0
Polly Queried 295 for: foot | Word Duration(ms): 444.0 | Transcription Confidence: 0.9922
122854.0
Polly Queried 297 fo

Polly Queried 374 for: a | Word Duration(ms): 84.0 | Transcription Confidence: 0.3936
148854.0
Polly Queried 375 for: move | Word Duration(ms): 312.0 | Transcription Confidence: 0.1894
149152.0
Polly Queried 376 for: on | Word Duration(ms): 84.0 | Transcription Confidence: 0.1761
149184.0
Polly Queried 377 for: Shaky | Word Duration(ms): 564.0 | Transcription Confidence: 0.9588
149734.0
Polly Queried 378 for: with | Word Duration(ms): 408.0 | Transcription Confidence: 0.9998
150048.0
Polly Queried 379 for: do | Word Duration(ms): 228.0 | Transcription Confidence: 0.7883
150228.0
Polly Queried 380 for: this | Word Duration(ms): 144.0 | Transcription Confidence: 0.5132
150334.0
Polly Queried 381 for: shit | Word Duration(ms): 288.0 | Transcription Confidence: 0.989
150598.0
Polly Queried 382 for: ourselves | Word Duration(ms): 560.0 | Transcription Confidence: 0.7018
151110.0
Polly Queried 383 for: But | Word Duration(ms): 288.0 | Transcription Confidence: 0.9251
151408.0
Polly Queried 3

Polly Queried 464 for: way | Word Duration(ms): 312.0 | Transcription Confidence: 1.0
184052.0
Polly Queried 465 for: we're | Word Duration(ms): 168.0 | Transcription Confidence: 0.2179
184168.0
Polly Queried 466 for: about | Word Duration(ms): 276.0 | Transcription Confidence: 0.7247
184416.0
Polly Queried 467 for: to | Word Duration(ms): 72.0 | Transcription Confidence: 0.423
184452.0
Polly Queried 468 for: start | Word Duration(ms): 348.0 | Transcription Confidence: 0.3038
184798.0
Polly Queried 469 for: I | Word Duration(ms): 504.0 | Transcription Confidence: 0.4199
185244.0
Polly Queried 470 for: can | Word Duration(ms): 408.0 | Transcription Confidence: 0.3018
185578.0
Polly Queried 471 for: let | Word Duration(ms): 180.0 | Transcription Confidence: 0.3573
185690.0
Polly Queried 472 for: this | Word Duration(ms): 216.0 | Transcription Confidence: 0.9993
185876.0
Polly Queried 473 for: one | Word Duration(ms): 132.0 | Transcription Confidence: 0.8625
185972.0
Polly Queried 474 for

### Processing the Output from Amazon Polly

In [48]:
import boto3
# Downloading files from s3.
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(sagemaker_session.default_bucket())
prefix = "polly-output/"

for object_summary in my_bucket.objects.filter(Prefix=prefix):
    file_name = object_summary.key.split('/')[-1]
    my_bucket.download_file(prefix+ file_name, prefix + file_name)
    
print("Files moved from s3 to repo.")

Files moved from s3 to repo.


In [193]:
print(index)

1036


Mixing Audio:

https://stackoverflow.com/questions/7629873/how-do-i-mix-audio-files-using-python

Pitch Modulation:

https://stackoverflow.com/questions/38923438/does-pydub-support-pitch-modulation

Create Video:

https://helpdeskgeek.com/windows-10/how-to-record-your-screen-on-windows-10/

Music Video:

https://www.oneimagevideo.com/