## OCI Speech  (Text to speech)


helpful links
- https://github.com/oracle/oci-python-sdk/tree/22fd62c8dbbd1aaed6b75754ec1ba8a3c16a4e5a/src/oci/ai_speech
- https://docs.oracle.com/en-us/iaas/Content/speech/home.htm
- #oci_speech_service_users or #igiu-innovation-lab slack channel 
- if you have errors running sample code reach out for help in #igiu-ai-learning

## Import Libraries

In [1]:
from oci.ai_speech import AIServiceSpeechClient
from oci.ai_speech.models import *
from oci.config import from_file
from oci.signer import load_private_key_from_file
import oci
import os

### Update variable to your versions

In [2]:
CONFIG_PROFILE = "AISANDBOX"  #change me 

## Set input variables

In [3]:
  
endpoint = "https://speech.aiservice.us-phoenix-1.oci.oraclecloud.com"
ccompartmentId= "ocid1.compartment.oc1..aaaaaaaaxj6fuodcmai6n6z5yyqif6a36ewfmmovn42red37ml3wxlehjmga" 

filename = "tts.mp3"  

# Supported output formats
# - TtsOracleSpeechSettings.OUTPUT_FORMAT_PCM
# - TtsOracleSpeechSettings.OUTPUT_FORMAT_MP3
# - TtsOracleSpeechSettings.OUTPUT_FORMAT_JSON
# - TtsOracleSpeechSettings.OUTPUT_FORMAT_OGG
outputFormat = TtsOracleSpeechSettings.OUTPUT_FORMAT_MP3
  
   
# This is the sample rate of the generated speech.
sampleRateInHz = 22050
   
#  
# Specify speech mark types to obtain in case of Json output
# This field will be ignored if the output format is not Json
# The output json will contain all the speech mark types mentioned in the below list
speechMarkTypes = [TtsOracleSpeechSettings.SPEECH_MARK_TYPES_WORD, TtsOracleSpeechSettings.SPEECH_MARK_TYPES_SENTENCE]

# If you want to enable streaming, set this value to true.
# With streaming, response is sent back in chunks.
# This means that you don't have to wait for entire speech to be generated before you can start using it.
isStreamEnabled = True
   

## Variables to experiment with 

In [4]:
# The input text can be passed in an SSML format as well.
# https://confluence.oci.oraclecorp.com/x/8Jcgvw
textType = TtsOracleSpeechSettings.TEXT_TYPE_TEXT
#textType = TtsOracleSpeechSettings.TEXT_TYPE_SSML

#ssml tags: https://docs.oracle.com/en-us/iaas/Content/speech/using/using-tts.htm
text = "A paragraph is a series of sentences that are organized and coherent, and are all related to a single topic. Almost every piece of writing you do that is longer than a few sentences should be organized into paragraphs."
#text = """
#<speak>
#  <sub alias="see oh two over see oh">CO2/CO </sub>  is made up of molecules that each have one carbon atom  double bonded to two oxygen
#</speak>
#"""

# if you choose naturl voice use natutal model and vica-versa
voiceId = "Brian"  # natural brian, annabell, Bob, Stacy, Cindy, Phil 
#voiceId = "Stacy" # std voceis Bob, Stacy, Cindy, Phil

voice_model_details = TtsOracleTts2NaturalModelDetails (
                voice_id=voiceId
            ) 

#voice_model_details = TtsOracleTts1StandardModelDetails (
#                voice_id=voiceId
#            )


## Load OCI config
Set up authentication for OCI by reading configuration from a file and creating a signer instance for secure API communication. The default configuration file location is ```~/.oci/config```.

In [5]:
config = oci.config.from_file('~/.oci/config', profile_name=CONFIG_PROFILE)

## Create AI service speech client

In [6]:
private_key = load_private_key_from_file(config['key_file'])

speech_client = AIServiceSpeechClient(config=config,signer= oci.signer.Signer(
        tenancy=config["tenancy"],
        user=config["user"],
        fingerprint=config["fingerprint"],
        private_key_file_location=config["key_file"]
        ),
        service_endpoint=endpoint)
    

## Create Speech analysis details

In [7]:
speech_details = SynthesizeSpeechDetails(
        text = text,
        is_stream_enabled=isStreamEnabled,
        compartment_id = compartmentId,
        configuration = TtsOracleConfiguration(
             model_details = voice_model_details,
            speech_settings = TtsOracleSpeechSettings(
                text_type = textType,
                sample_rate_in_hz = sampleRateInHz,
                output_format = outputFormat,
                speech_mark_types = speechMarkTypes
            )
        )
    )

## Analysize the speech  


In [8]:
response = speech_client.synthesize_speech(speech_details)
if (response.status != 200):
    print(f'Request failed with {response.status}')


## save the audio

In [9]:
with open(filename, 'wb') as f:
    for b in response.data.iter_content():
        f.write(b)
print (f"TTS files saved at {filename} in {os.getcwd()}")

TTS files saved at tts.mp3 in /Users/ashish/work/code/python/workshop/speech


## Exercise : generate an coversation with two people

1. Prompt llm to generate a transcript between two people in a JSON format
    * Customer service agent
    * Customer wanting to return an item 
1. Use two different voices to convert each sentence into a mp3 file
1. Combine the individual voice fragments into one
