# Summarizing meeting presentation along with meeting audio recording

1. The Claude 3 family of models comes with new vision capabilities that allow Claude to understand and analyze images, opening up exciting possibilities for multimodal interaction
2. Amazon Transcribe is a fully managed, automatic speech recognition (ASR) service that makes it easy for developers to add speech to text capabilities to their applications.
3. With speaker diarization, you can distinguish between different speakers in your transcription output. Amazon Transcribe can differentiate between a maximum of 30 unique speakers and labels the text from each unique speaker with a unique value

# Convert pdf document in to images

Using python library PyMuPDF, PDF document is split in to png images, one for each page

In [1]:
!pip install PyMuPDF==1.24.5
!pip install botocore==1.34.131
!pip install langchain_aws

Collecting botocore==1.34.131
  Using cached botocore-1.34.131-py3-none-any.whl.metadata (5.7 kB)
Using cached botocore-1.34.131-py3-none-any.whl (12.3 MB)
Installing collected packages: botocore
  Attempting uninstall: botocore
    Found existing installation: botocore 1.34.162
    Uninstalling botocore-1.34.162:
      Successfully uninstalled botocore-1.34.162
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.34.3 requires botocore==1.35.3, but you have botocore 1.34.131 which is incompatible.
boto3 1.34.162 requires botocore<1.35.0,>=1.34.162, but you have botocore 1.34.131 which is incompatible.[0m[31m
[0mSuccessfully installed botocore-1.34.131
Collecting botocore<1.35.0,>=1.34.162 (from boto3<1.35.0,>=1.34.131->langchain_aws)
  Using cached botocore-1.34.162-py3-none-any.whl.metadata (5.7 kB)
Using cached botocore-1.34.162-py3-none-any.whl 

In [2]:
from utils.audioutil import pdf_to_pngs
from utils.audioutil import build_slides_narration_prompt
from utils.audioutil import get_completion
from utils.audioutil import write_string_to_s3
from utils.audioutil import write_string_array_to_s3
from utils.audioutil import audio_transcribe
from utils.audioutil import get_llm
from utils.audioutil import convert_transctiption_to_txt_file
from utils.audioutil import build_previous_slides_prompt_full

Using modelId: anthropic.claude-3-haiku-20240307-v1:0
Using region:  ap-south-1


In [3]:
bucket_name = 'audio-ppt-summarization-demo'
input_file_key = 'input/Sagemaker_launch.pdf' 
output_file_key = 'output/slide_narration.txt' 
output_array_key = 'output/slide_narration_array.txt' 
output_file_key_audio = 'output/audio.txt' 
output_file_key_summary = 'output/summary.txt' 
output_file_key_summary_langchain = 'output/summary_langchain.txt' 

In [4]:
import os
# Create output directory for images
imagedir = "images/slides"
if not os.path.exists("images/slides"):
    os.makedirs("images/slides")

In [5]:
# Call the pdf_to_pngs function to convert the PDF to PNG images
print('Genarating Images')
pdf_pngs = pdf_to_pngs(bucket_name, input_file_key )
print('Image genaration done')

Genarating Images
images/slides/page_1.png
images/slides/page_2.png
images/slides/page_3.png
images/slides/page_4.png
images/slides/page_5.png
images/slides/page_6.png
images/slides/page_7.png
images/slides/page_8.png
images/slides/page_9.png
images/slides/page_10.png
images/slides/page_11.png
images/slides/page_12.png
images/slides/page_13.png
images/slides/page_14.png
images/slides/page_15.png
images/slides/page_16.png
images/slides/page_17.png
images/slides/page_18.png
images/slides/page_19.png
images/slides/page_20.png
images/slides/page_21.png
images/slides/page_22.png
images/slides/page_23.png
images/slides/page_24.png
images/slides/page_25.png
images/slides/page_26.png
images/slides/page_27.png
images/slides/page_28.png
images/slides/page_29.png
images/slides/page_30.png
images/slides/page_31.png
images/slides/page_32.png
images/slides/page_33.png
images/slides/page_34.png
images/slides/page_35.png
images/slides/page_36.png
images/slides/page_37.png
images/slides/page_38.png
ima

# Convert image into slide narration

The best way to pass Claude charts and graphs is to take advantage of its vision capabilities. That is, give Claude an image of the chart or graph, along with a text question about it. 
We will use claude-haiku to generate narration for the slides. To improve the accuracy, we will also be passing the previous three slide narrations to claude-haiku

In [6]:
# Now we use our functions to narrate the entire deck. Note that this may take a few minutes to run (often up to 10).
import re
from tqdm import tqdm
previous_slide_narratives = []
for i, pdf_png in tqdm(enumerate(pdf_pngs)):
    print('Building narrative for : Slide ' + str(i))
    messages = [
        {
            "role": 'user',
            "content": [
                {"text": build_slides_narration_prompt(previous_slide_narratives)},
                {"image": {
                    "format": 'jpeg',
                    "source": {"bytes": pdf_png }
                    },
                }
            ]
        }
    ]
    completion = get_completion(messages)

    pattern = r"<narration>(.*?)</narration>"
    match = re.search(pattern, completion.strip(), re.DOTALL)
    if match:
        narration = match.group(1)
    else:
        raise ValueError("No narration available.")
    
    previous_slide_narratives.append(narration)
    # If you want to see the narration we produced, uncomment the below line
    # print(narration)
print('Combining narrations')
slide_narration = build_previous_slides_prompt_full(previous_slide_narratives)
print("Generation of slide narration done")

0it [00:00, ?it/s]

Building narrative for : Slide 0


1it [00:01,  1.92s/it]

Building narrative for : Slide 1


2it [00:05,  2.73s/it]

Building narrative for : Slide 2


3it [00:09,  3.49s/it]

Building narrative for : Slide 3


4it [00:13,  3.55s/it]

Building narrative for : Slide 4


5it [00:16,  3.57s/it]

Building narrative for : Slide 5


6it [00:21,  3.78s/it]

Building narrative for : Slide 6


7it [00:25,  4.08s/it]

Building narrative for : Slide 7


8it [00:30,  4.28s/it]

Building narrative for : Slide 8


9it [00:33,  3.79s/it]

Building narrative for : Slide 9


10it [00:36,  3.80s/it]

Building narrative for : Slide 10


11it [00:42,  4.39s/it]

Building narrative for : Slide 11


12it [00:47,  4.66s/it]

Building narrative for : Slide 12


13it [00:54,  5.09s/it]

Building narrative for : Slide 13


14it [00:58,  4.84s/it]

Building narrative for : Slide 14


15it [01:03,  4.96s/it]

Building narrative for : Slide 15


16it [01:08,  4.88s/it]

Building narrative for : Slide 16


17it [01:13,  4.86s/it]

Building narrative for : Slide 17


18it [01:17,  4.80s/it]

Building narrative for : Slide 18


19it [01:22,  4.76s/it]

Building narrative for : Slide 19


20it [01:27,  4.77s/it]

Building narrative for : Slide 20


21it [01:29,  3.95s/it]

Building narrative for : Slide 21


22it [01:32,  3.77s/it]

Building narrative for : Slide 22


23it [01:36,  3.80s/it]

Building narrative for : Slide 23


24it [01:39,  3.71s/it]

Building narrative for : Slide 24


25it [01:43,  3.54s/it]

Building narrative for : Slide 25


26it [01:47,  3.78s/it]

Building narrative for : Slide 26


27it [01:51,  3.79s/it]

Building narrative for : Slide 27


28it [01:56,  4.13s/it]

Building narrative for : Slide 28


29it [01:59,  3.88s/it]

Building narrative for : Slide 29


30it [02:03,  4.06s/it]

Building narrative for : Slide 30


31it [02:08,  4.23s/it]

Building narrative for : Slide 31


32it [02:13,  4.59s/it]

Building narrative for : Slide 32


33it [02:19,  4.73s/it]

Building narrative for : Slide 33


34it [02:24,  4.87s/it]

Building narrative for : Slide 34


35it [02:29,  5.11s/it]

Building narrative for : Slide 35


36it [02:34,  5.07s/it]

Building narrative for : Slide 36


37it [02:39,  4.92s/it]

Building narrative for : Slide 37


38it [02:44,  5.05s/it]

Building narrative for : Slide 38


39it [02:49,  4.80s/it]

Building narrative for : Slide 39


40it [02:52,  4.36s/it]

Building narrative for : Slide 40


41it [02:58,  4.84s/it]

Building narrative for : Slide 41


42it [03:02,  4.56s/it]

Building narrative for : Slide 42


43it [03:04,  3.86s/it]

Building narrative for : Slide 43


44it [03:07,  4.26s/it]

Combining narrations
Generation of slide narration done





In [7]:
# Store narration array in s3 bucket 
write_string_array_to_s3(bucket_name, output_array_key, previous_slide_narratives)
# Store narration in S3:
write_string_to_s3(bucket_name, output_file_key, slide_narration)

# Meeting recording audio to text

Amazon Transcribe will generate text from the meeting recording file and also identify speakers

In [8]:
# Meeting Audio transcription using transcribe
print('--Calling transcribe-----');
result = audio_transcribe(bucket_name, audio_file_key = 'input/sagemaker.m4a',  max_speakers = 10)
print("Audio transcription done")

--Calling transcribe-----
sagemaker already exists.
Do you want to override the existing job (Y/N): y
Audio transcription done


In [9]:
# Store Transcription result json in s3 bucket 
transcript_content, transcript_output_file = convert_transctiption_to_txt_file(bucket_name, 'output/Transcription.json')

# Store Transcription text in S3:
write_string_to_s3(bucket_name, output_file_key_audio, transcript_content)

# Summarize

Finally claude-sonnet is used to summarize the presennation along with the meeting reording

In [10]:
# Call LLM to Summarize
print("Calling LLM")
llm = get_llm()

prompt = f"""Provide a detailed summary of the following call transcript :
            {transcript_content}
            and presentation narrative of the powerpoint presentation which was presented during the call:
            {slide_narration}
            into one or more clear and 
            readable paragraphs. Generate a brief summary highlighting the main ideas, viewpoints, or statements.
            At the end of your summary, give a bullet point list of the key action 
            items, to-do's, and followup activities
            """

response_text = llm.invoke(prompt)

#Display response
print(response_text.content)

# Store summary in S3:
write_string_to_s3(bucket_name, output_file_key_summary, response_text.content)

Calling LLM
Here is a summary of the call transcript and presentation:

The presentation introduces Amazon SageMaker Studio, a new integrated development environment (IDE) for machine learning on AWS. It begins by discussing the importance of machine learning and the challenges companies face in adopting it, such as implementing ML workflows, data preparation, model training, deployment, monitoring, and collaboration. 

Amazon SageMaker was created to help customers overcome these hurdles. SageMaker Studio brings together several new capabilities into a unified web interface to accelerate the ML workflow from data to deployed model. These include collaborative notebooks, experiment tracking, debugging tools, automated model building, and monitoring of live models.

The presenters showcase how SageMaker Studio's visual UI allows data scientists to easily set up experiments, track iterations, compare results, debug issues, and deploy models - all within the same environment. They highlig