# Real Time Voice Cloning

With this example, we will submit a sample batch processing job to SageMaker. Make sure your run ./scripts/install.sh first. If you have the AWS CLI installed, you should be able to run this notebook locally. The transform() task however, will still run in the cloud on a GPU instance.

In [1]:
bucket_name = ''              # <-- Your bucket name goes here

In [2]:
import boto3
import sagemaker as sage
import json

In [3]:
sess = sage.Session()
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
trans = sage.transformer.Transformer('voice-cloning-recall', 1, 'ml.p2.xlarge')
s3 = boto3.resource('s3')

## Sample Job

Here, we setup our batch processing job to clone two sample voices. Let's have Darth Vader and Morgan Freeman read some novel passages. Of course, you could do this with different utterance files or text.

In [4]:

hp_text = ["Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much.",
           "They were the last people you'd expect to be involved in anything strange or mysterious, because they just didn't hold with such nonsense.",
           "Mr. Dursley was the director of a firm called Grunnings, which made drills.",
           "He was a big, beefy man with hardly any neck, although he did have a very large mustache.",
           "Mrs. Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors.",
           "The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere."
          ]

two_cities_text = ["It was the best of times, it was the worst of times",
                   "it was the age of wisdom, it was the age of foolishness",
                   "it was the epoch of belief, it was the epoch of incredulity"
                  ]

job_json = {}
job_json['request_id'] = 'test'
job_json['request_type'] = 'batch_processing'

job1 = {}
job1['job_id'] = 'freeman_hp'
job1['bucket'] = bucket_name
job1['utterance_file'] = 'ssre-normal.wav'
job1['sentences'] = hp_text

job2 = {}
job2['job_id'] = 'freeman_two_cities'
job2['bucket'] = bucket_name
job2['utterance_file'] = 'ssre-normal.wav'
job2['sentences'] = two_cities_text

job3 = {}
job3['job_id'] = 'vader_hp'
job3['bucket'] = bucket_name
job3['utterance_file'] = 'darth.mp3'
job3['sentences'] = hp_text

job4 = {}
job4['job_id'] = 'vader_two_cities'
job4['bucket'] = bucket_name
job4['utterance_file'] = 'darth.mp3'
job4['sentences'] = two_cities_text

job_json['jobs'] = [job1, job2, job3, job4]

In [5]:
# Add our job as a json file to S3
with open('./data/sample_job.json', 'w') as f:
    f.write(json.dumps(job_json, indent=4))

s3.Object(bucket_name, 'sample_job.json').upload_file('./data/sample_job.json')

In [6]:
# Start the job. It should take several minutes. Although most of that is from starting the container.
trans.transform(f's3://{bucket_name}/sample_job.json', content_type='application/json')
trans.wait()

..................................[34m2020/04/22 21:53:13 [notice] 10#10: using the "epoll" event method[0m
[34m2020/04/22 21:53:13 [notice] 10#10: nginx/1.14.0 (Ubuntu)[0m
[34m2020/04/22 21:53:13 [notice] 10#10: OS: Linux 4.14.165-103.209.amzn1.x86_64[0m
[34m2020/04/22 21:53:13 [notice] 10#10: getrlimit(RLIMIT_NOFILE): 65536:99999[0m
[34m2020/04/22 21:53:13 [notice] 10#10: start worker processes[0m
[34m2020/04/22 21:53:13 [notice] 10#10: start worker process 12[0m
[34m2020/04/22 21:53:13 [crit] 12#12: *1 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or directory) while connecting to upstream, client: 169.254.255.130, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock:/ping", host: "169.254.255.131:8080"[0m
[34m169.254.255.130 - - [22/Apr/2020:21:53:13 +0000] "GET /ping HTTP/1.1" 502 182 "-" "Go-http-client/1.1"[0m
[34m2020/04/22 21:53:14 [crit] 12#12: *3 connect() to unix:/tmp/gunicorn.sock failed (2: No such file or dire

In [7]:
# When the job is finished we should have all of the results files in S3. Here, we'll just download them.
for f in s3.Bucket(bucket_name).objects.all():
    file_name = f.key
    if 'vader' in file_name or 'freeman' in file_name:
        print(f'Downloading: {file_name}')
        s3.Bucket(bucket_name).download_file(file_name, f'./data/{file_name}')

Downloading: freeman_hp_0.wav
Downloading: freeman_hp_1.wav
Downloading: freeman_hp_2.wav
Downloading: freeman_hp_3.wav
Downloading: freeman_hp_4.wav
Downloading: freeman_hp_5.wav
Downloading: freeman_two_cities_0.wav
Downloading: freeman_two_cities_1.wav
Downloading: freeman_two_cities_2.wav
Downloading: vader_hp_0.wav
Downloading: vader_hp_1.wav
Downloading: vader_hp_2.wav
Downloading: vader_hp_3.wav
Downloading: vader_hp_4.wav
Downloading: vader_hp_5.wav
Downloading: vader_two_cities_0.wav
Downloading: vader_two_cities_1.wav
Downloading: vader_two_cities_2.wav
