# Generate Podcast Synopsis for Various Genres

In [13]:
%load_ext autoreload
%autoreload 2

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Set Up Azure OpenAI

In [14]:
import os
import openai
from dotenv import load_dotenv

# Set up Azure OpenAI
load_dotenv()
openai.api_type = "azure"
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = "2022-12-01"
openai.api_key = os.getenv("OPENAI_API_KEY")

True

## Deploy a Model

In [15]:
# id of desired_model
desired_model = 'text-davinci-003' # suitable for text generation
desired_capability = 'completion'

# list models deployed with
deployment_id = None
result = openai.Deployment.list()

for deployment in result.data:
    if deployment["status"] != "succeeded":
        continue
    
    model = openai.Model.retrieve(deployment["model"])

    # check if desired_model is deployed, and if it has 'completion' capability
    if model["id"] == desired_model and model['capabilities'][desired_capability]:
        deployment_id = deployment["id"]
        
# if no model deployed, deploy one
if not deployment_id:
    print('No deployment with status: succeeded found.')

    # Deploy the model
    print(f'Creating a new deployment with model: {desired_model}')
    result = openai.Deployment.create(model=desired_model, scale_settings={"scale_type":"standard"})
    deployment_id = result["id"]
    print(f'Successfully created {desired_model} that supports text {desired_capability} with id: {deployment_id}.')
else:
    print(f'Found a succeeded deployment of "{desired_model}" that supports text {desired_capability} with id: {deployment_id}.')

Found a succeeded deployment of "text-davinci-003" that supports text completion with id: text-davinci-003.


## Text chunks generator

In [16]:
# A generator that split a text into smaller chunks of size n, preferably ending at the end of a sentence
def chunk_generator(text, n, tokenizer):
    tokens = tokenizer.encode(text)
    i = 0
    while i < len(tokens):
        # Find the nearest end of sentence within a range of 0.5 * n and 1.5 * n tokens
        j = min(i + int(1.5 * n), len(tokens))
        while j > i + int(0.5 * n):
            # Decode the tokens and check for full stop or newline
            chunk = tokenizer.decode(tokens[i:j])
            if chunk.endswith(".") or chunk.endswith("\n"):
                break
            j -= 1
        # If no end of sentence found, use n tokens as the chunk size
        if j == i + int(0.5 * n):
            j = min(i + n, len(tokens))
        yield tokens[i:j]
        i = j


## Request API

In [17]:
def request_api(document, prompt_postfix, max_tokens):
    prompt = prompt_postfix.replace('<document>',document)
    #print(f'>>> prompt : {prompt}')

    response = openai.Completion.create(  
    deployment_id=deployment_id, 
    prompt=prompt,
    temperature=0.5,
    max_tokens=max_tokens,
    top_p=1,
    frequency_penalty=1,
    presence_penalty=1,
    stop='###')

    return response['choices'][0]['text']

## Generate Synopsis

In [18]:
def get_synopsis(content, prompt_postfix):
    import tiktoken

    synopsis_chunck = []
    n = 2000 # max tokens for chuncking
    max_tokens = 1000 # max tokens for response

    tokenizer = tiktoken.get_encoding('p50k_base')

    # Generate chunkcs    
    chunks = chunk_generator(content, n, tokenizer)

    # Decode chunk of text
    text_chunks = [tokenizer.decode(chunk) for chunk in chunks]

    # Request api
    for chunk in text_chunks:
        synopsis_chunck.append(request_api(chunk, prompt_postfix, max_tokens))
        #print(chunk)
        #print('>>> synopsis: \n' + synopsis_chunck[-1])

    # Synopsis
    synopsis = ' '.join(synopsis_chunck)

    return synopsis

### Genre : Comedy

With no further information provided in the prompt, the response is rather formal and dry for the genre of comedy.

In [19]:
fname = "../data/comedy-booking-online-transcript.txt"

with open(fname, 'r') as f:
    content = f.readlines()

# convert list to str
content = ' '.join(content) 

In [20]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nSummarise the transcript of a podcast above into a synopsis. 
  \nSynopsis : 
"""

synopsis = get_synopsis(content, prompt_postfix)
print(synopsis)

Michael McKintyre talks about the difficulties of buying tickets online and how it can be a long process. He mentions how companies are desperate to get your email address and how they make it difficult to opt out from their mailing lists. He also reflects on the good old days when all we needed was one password, but now passwords have become more complex with numbers added in for security reasons. Michael then goes on to talk about booking fees, restricted views, and reselling tickets which he finds frustrating.


Adding more information may help in achieving a more desirable results:
- genre
- desired style

Be creative and explicit about the desired outcome when desiging the prompt. 

In [21]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nCreate a synopsis to capture audience curiosy and heighten anticipation. This is a stand-up comedy. 
  \nSynopsis : 
"""
synopsis = get_synopsis(content, prompt_postfix)
print(synopsis)

Michael McKintyre’s Comedy Show is a hilarious stand-up comedy show that will leave you in stitches! Michael takes us on an entertaining journey as he rants about the absurdities of booking tickets online, having to remember passwords and security questions, and even his experience with restricted view seating. Join him for a night full of laughter and entertainment as he shares stories from his own life experiences.


### Genre : Informational

In [25]:
fname = "../data/ft-interview-transcription.txt"

with open(fname, 'r') as f:
    content = f.readlines()

# convert list to str
content = ' '.join(content) 

In [28]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nGenerate a short synopsis from the transcription of an interview.  
  \nSynopsis : 
"""

synopsis = get_synopsis(content, prompt_postfix)
print(synopsis)

Robert Armstrong, the FT's US financial commentator, discusses the collapse of Silicon Valley Bank and why it is not a repeat of 2008. He explains that two factors led to SVB's downfall: bad decisions at the bank and a rapid increase in interest rates. Rob also examines how SVB was affected by these changes, as well as the implications for other banks in terms of regulation and solvency. Finally, he emphasizes that people should not panic about their deposits being safe if they are under $250,000. Robert Armstrong, an expert on banking regulation, discusses the implications of recent financial issues for banks. He explains that both top-tier and second-tier banks now have more capital than before 2008 and suggests that regulators need to think harder about the threat posed by securities building up on bank balance sheets. He also speaks about maturity transformation as a form of 'black magic' and cautions businesspeople against depositing money in a bank simply because their rich frien

In [29]:
# Prompt postfix
prompt_postfix = """ <document>
  \n###
  \nGenerate a short synopsis from the transcription of an interview, such that it trigger curiosity, include a thought provoking question. Add "Let's find out!" at the end.  
  \nSynopsis : 
"""

synopsis = get_synopsis(content, prompt_postfix)
print(synopsis)

Robert Armstrong, the FT's US financial commentator, explains why the collapse of Silicon Valley Bank is not a repeat of 2008. He looks at two factors that were responsible for it - bad decisions by the bank and an increase in interest rates - as well as how its balance sheet was affected by them. Rob also discusses how this incident tells us about US banking regulation and whether things are working as they should or not. What does this moment tell us about our banking system? Let's find out! In this episode, Robert Armstrong from the Financial Times talks about the two-tier system of banking regulation and how it affects banks' balance sheets. He also explains why no regulatory system can prevent all bank failures, and what entrepreneurs and other investors should be asking their banks in the future. What will happen to the banking industry structure as a result of this crisis? Let's find out!
