Skip to content



This is sample solution that demonstrates how builders can use polly to generate audio different than the source audio, generate subtitles from a text file, generate speech marks and tie them all together.


This demo can be deployed via CDK in your own AWS Account.

Input parameters

pollyLanguageCode - The polly target language code to which the audio needs to be converted to. Click here for list of eligible values.
Solution default value: 'es-US'

pollyVoiceId - The polly voice id to be used for the converted audio. Click here for list of eligible values.
Solution default value: 'Miguel'

pollyEngine - Polly Amazon Polly enables you to use either neural or standard voice with the engine property. It has two possible values: standard or neural.
Solution default value: 'standard'

mediaConvertLangShort - Short language code to be used in MediaConvert Job for captioning.
Solution default value: SPA

mediaConvertLangLong - Long language code to be used in MediaConvert Job for captioning.
Solution default value: Spanish

targetLanguageCode - Target language code to which the audio and subtitles needs to be generated.
Solution default value: es

Deploy solution using default values

cdk deploy

Deploy solution using by overriding default values, for e.g, for converting video to Hindi language

cdk deploy --parameters pollyLanguageCode=hi-IN --parameters pollyVoiceId=Aditi --parameters pollyEngine=standard --parameters mediaConvertLangShort=HIN --parameters mediaConvertLangLong=Hindi --parameters targetLanguageCode=hi

Note: The deployment takes few minutes to complete

How it works


  1. Once the CDK is deployed, to test the solution, you can upload a video(.mp4) into the <S3_ROOT_BUCKET>/inputVideo/, for e.g., s3://pollyblogcdkstack-pollyblogbucket9110****-*****/inputVideo/.
  2. After successfully uploading the video, S3 object creation event triggers a lambda which kicks off a state machine execution. To see the progress of the execution, navigate to Step Functions -> State machines from the admin console and search for state machine name that starts with 'ProcessAudioWithSubtitles'.
  3. The first step of the state machine is to take the input video and generate transcription using Amazon Transcribe.
  4. Once the transcription job is complete, it then uses Amazon Translate to translate the text to target language passed in via input parameters.
  5. It the starts 2 parallel flows for generating audio and speech marks for the target language passed via input parameters.
  6. Once the parallel steps are complete, the final step is to tie them all together with MediaCovert. A simple mediaconvert settings file is included as part of the codebase.


To remove all deployed resources

cdk destroy

Please make sure to delete S3 buckets that are created as part of the solution to avoid any unnecessary storage costs.


Reagan Rosario (
Matthew Juliana (
Anil Kodali (
Prasanna Saraswathi Krishnan (
Justin Haydt (

If you are interested in contributing, please refer to contributing guidelines here.


No description, website, or topics provided.



Code of conduct

Security policy





No releases published


No packages published