In [1]:
from dotenv import load_dotenv
import os

load_dotenv()
openai_key = os.getenv("OPENAI_API_KEY")

In [10]:
MODEL = "gpt-3.5-turbo"
AUDIENCE = "general"

from langchain_openai import ChatOpenAI
# Creating a GPT model for all purposes
# Will change behaviour of the specific instance using the prompt
llm = ChatOpenAI(temperature=0.5, max_tokens=700, model=MODEL)

In [66]:
from PyPDF2 import PdfReader

file_path = "../notebooks/data/Segment-anything-meta-paper.pdf"
pdfreader = PdfReader(file_path)

# Read text from the pdf 
text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        text += content

print(len(text))

167014


In [129]:
text

' And thank you to our listeners for tuning in. Stay tuned for more exciting episodes where we explore the latest innovations in the world of technology. Until next time!'

In [67]:
# Total number of input tokens in the document 
print(llm.get_num_tokens(text))

42020


Since our context window is only 16K, we need to split our document into multiple smaller chunks, process them and them combine the output

In [68]:
# Splitting the text 
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=12000, chunk_overlap=50)
chunks = splitter.create_documents([text])
print(f'Number of chunks: {len(chunks)}')

Number of chunks: 14


In [69]:
chunks[:4]

[Document(page_content='Segment Anything\nAlexander Kirillov1,2,4Eric Mintun2Nikhila Ravi1,2Hanzi Mao2Chloe Rolland3Laura Gustafson3\nTete Xiao3Spencer Whitehead Alexander C. Berg Wan-Yen Lo Piotr Doll ´ar4Ross Girshick4\n1project lead2joint first author3equal contribution4directional lead\nMeta AI Research, FAIR\n(b) Model: Segment Anything Model (SAM)promptimagevalid maskimage encoderprompt encoderlightweight mask decoder\n(a) Task: promptable segmentationsegmentation promptimagemodelcat withblack earsvalid mask\n(c) Data: data engine (top) & dataset (bottom)•1+ billion masks•11 million images •privacy respecting•licensed imagesannotatetraindatamodelSegment Anything 1B (SA-1B):\nFigure 1: We aim to build a foundation model for segmentation by introducing three interconnected components: a prompt-\nable segmentation task, a segmentation model (SAM) that powers data annotation and enables zero-shot transfer to a range\nof tasks via prompt engineering, and a data engine for collecting S

In [83]:
# Create summaries of the document using map reduce method 
from langchain.prompts import PromptTemplate

# Summarization chain
summary_prompt = """
You are an expert in the field of meta-learning. 
You are provided with a text. 
You need to extract the key points from the text. 
The points should be helpful in creating a script for a podcast episode.


TEXT: {text}

Key points:
"""
map_prompt_template = PromptTemplate(input_variables=["text"],
                                template=summary_prompt,)

In [84]:

# Combining all the summaries together 
combine_prompt = """
Provide a final summary of the entire document with these key points 
The overall summary should be helpful in creating a script for a podcast episode.
When dealing with scientific topics, mention the data as well.

Key points: {key_points}
"""

reduce_prompt_template = PromptTemplate(input_variables=["key_points"],
                                template=combine_prompt,)

In [126]:
from langchain.chains.summarize import load_summarize_chain

summary_chain = load_summarize_chain(
    llm = llm, 
    chain_type="map_reduce",
    map_prompt=map_prompt_template,
    map_reduce_document_variable_name="text",
    combine_prompt=reduce_prompt_template,
    combine_document_variable_name="key_points",
    verbose=False
)


output = summary_chain.invoke(chunks)

In [128]:
output['output_text']

"In this podcast episode, we explored the Segment Anything project and the Segment Anything Model (SAM) for image segmentation. SAM is designed to be promptable, allowing for zero-shot transfer to new tasks and distributions. The SA-1B dataset, with over 1 billion masks, showcases SAM's capabilities. SAM has shown promising results in various tasks but still has some limitations. \n\nWe also discussed the importance of evaluating mask quality in image segmentation, highlighting the impact of errors on the final score. Consistency in judgment and adherence to rating criteria are essential for accurate evaluations. Understanding and improving mask quality is crucial in image segmentation tasks. \n\nOverall, the Segment Anything project and SAM model represent significant advancements in image segmentation, offering a powerful tool for prompt-based segmentation tasks and zero-shot transfer learning in computer vision applications."

In [91]:
len(output['output_text'])

1039

In [106]:
llm = ChatOpenAI(temperature=0.7, model=MODEL)

# Podcast chain 
podcast_writer = """
System: You are an expert scriptwriter. You are provided with a summary of a document.
You need to create a coherent and engaging script for a podcast episode based on this summary.
Make it a conversation between two hosts.
Remember, the podcast is for a {AUDIENCE} audience. 
There should at least be 10 dialogues from each person. Make it detailed and as a conversation
Please follow the structure below for the script for a 2 min episode.:

Summary: {summary}

Podcast Script:
Person 1: 
Person 2:

""" 

writer_prompt = PromptTemplate(input_variables=["AUDIENCE", "summary"],
                                template=podcast_writer)
writer_chain = writer_prompt | llm

In [102]:
final_response = writer_chain.invoke({"AUDIENCE": AUDIENCE, 
                                      "summary": output['output_text']})

In [112]:
conversation = []
for dialogue in final_response.content.split("\n\n"):
    person, text = dialogue.split(':')
    conversation.append((person, text))

conversation[:5]

[('Person 1',
  " Welcome back to our podcast, where we dive deep into the world of cutting-edge technology. Today, we're going to explore the fascinating Segment Anything project and the Segment Anything Model, also known as SAM."),
 ('Person 2',
  " That's right! SAM is truly groundbreaking with its promptable design that allows for zero-shot transfer to new tasks and distributions. It's amazing to see how it showcases such strong performance in various downstream tasks."),
 ('Person 1',
  " Absolutely! And let's not forget about the SA-1B dataset for image segmentation, which contains over 1 billion masks on 11 million images. It's a treasure trove of valuable data collected through a model-assisted data collection process."),
 ('Person 2',
  " The sheer scale of the SA-1B dataset is mind-blowing. It's definitely a game-changer for researchers looking to push the boundaries of image segmentation."),
 ('Person 1',
  " And speaking of image segmentation, we also delved into the world 

In [123]:
from elevenlabs import play, save
from elevenlabs.client import ElevenLabs


el_api_key = os.getenv("ELEVEN_LABS_API_KEY")
client = ElevenLabs(api_key=el_api_key)

In [118]:
responses = client.voices.get_all()
len(responses.voices)

45

In [124]:
audios = []
for i, value in enumerate(conversation):
    speaker, line = value
    if speaker == "Person 1":
        voice = "Chris"
    else:
        voice = "Gigi"
    audio = client.generate(text=line, voice=voice, model='eleven_monolingual_v1')
    play(audio)
    audios.append([i, voice, audio])


In [125]:
os.makedirs("audio_files", exist_ok=True)
for i, voice, audio in audios:
    save(audio, f"audio_files/{voice}_{i}.mp3")
    

In [54]:
# Dump refine outputs into a file
import json
with open("refine_outputs.json", "w") as f:
    json.dump(refine_outputs['output_text'], f)

# Load refine outputs from a file
import json
with open("refine_outputs.json", "r") as f:
    refine_outputs = json.load(f)

In [55]:
refine_outputs

"The Segment Anything Model (SAM) is a decoder model developed by the FAIR team of Meta AI for prompt-based segmentation tasks. It predicts multiple masks using various loss functions and incorporates geographic information using named entity recognition and APIs. SAM has been evaluated for segmenting clothing by gender and age, zero-shot transfer capabilities, edge detection, object proposals, and text-to-mask modeling. A variation of SAM, known as SAM, produces a single mask per image with similar performance. The dataset used, Segment Anything 1B (SA-1B), is one of the most geographically diverse segmentation datasets and is intended for research purposes only. Annotators were trained and compensated using a proprietary platform, and the dataset will be released under a license agreement for specific research purposes. SAM has impressive zero-shot performance but may miss fine structures and hallucinate small disconnected components. It was trained on licensed images, and the enviro

In [49]:
llm = ChatOpenAI(temperature=0.7, model=MODEL)

# Podcast chain 
podcast_writer = """
System: You are an expert scriptwriter. You are provided with a summary of a document.
You need to create a coherent and engaging script for a podcast episode based on this summary.
Make it a conversation between two hosts.
Remember, the podcast is for a {AUDIENCE} audience. 
Please follow the structure below for the script for a 2 min episode.:

Summary: {summary}

Podcast Script:
Person 1: 
Person 2:

""" 

writer_prompt = PromptTemplate(input_variables=["AUDIENCE", "summary"],
                                template=podcast_writer)
writer_chain = writer_prompt | llm

In [50]:
final_response = writer_chain.invoke({"AUDIENCE": AUDIENCE, 
                                      "summary": refine_outputs})



In [52]:
final_response.content

"Person 1: Welcome back to our podcast, where we explore the latest advancements in AI technology. Today, we're diving into the world of segmentation models with the Segment Anything Model, or SAM, developed by the FAIR team of Meta AI.\n\nPerson 2: That's right! SAM is a decoder model that is specifically designed for prompt-based segmentation tasks. It's pretty impressive how SAM predicts multiple masks using different loss functions and incorporates geographic information using named entity recognition and APIs.\n\nPerson 1: Absolutely. SAM has been evaluated for various tasks such as segmenting clothing by gender and age, edge detection, and even text-to-mask modeling. It's fascinating to see how this model can be used in different applications.\n\nPerson 2: And let's not forget about SAM+, a variation of SAM that produces a single mask per image with similar performance. The dataset used, Segment Anything 1B (SA-1B), is one of the most geographically diverse segmentation datasets 

"Person 1: Welcome back to our podcast, where we explore the latest advancements in AI technology. Today, we're diving into the world of segmentation models with the Segment Anything Model, or SAM, developed by the FAIR team of Meta AI.\n\nPerson 2: That's right! SAM is a decoder model that is specifically designed for prompt-based segmentation tasks. It's pretty impressive how SAM predicts multiple masks using different loss functions and incorporates geographic information using named entity recognition and APIs.\n\nPerson 1: Absolutely. SAM has been evaluated for various tasks such as segmenting clothing by gender and age, edge detection, and even text-to-mask modeling. It's fascinating to see how this model can be used in different applications.\n\nPerson 2: And let's not forget about SAM+, a variation of SAM that produces a single mask per image with similar performance. The dataset used, Segment Anything 1B (SA-1B), is one of the most geographically diverse segmentation datasets out there.\n\nPerson 1: It's important to note that SAM has impressive zero-shot performance, but there are some limitations, such as missing fine structures and hallucinating small disconnected components. It's crucial to exercise caution when using SAM for zero-shot segmentation tasks.\n\nPerson 2: Definitely. Users are encouraged to run fairness evaluations and exercise judgment when using SAM for downstream tasks. The model's quality can be assessed through an interface that provides different views of the mask for evaluation based on specific criteria.\n\nPerson 1: And let's not forget about the guidelines provided for annotators to review mask quality, including assessing boundary accuracy and avoiding errors. These guidelines are essential for ensuring the accuracy of the masks produced by SAM.\n\nPerson 2: Overall, SAM is a powerful tool for segmentation tasks, but it's essential to use it responsibly and with caution. Make sure to check out the guidelines provided to assess the quality of the masks and make informed decisions when using SAM for your research purposes.\n\nPerson 1: That's all the time we have for today. Thanks for tuning in, and we'll catch you on the next episode of our podcast. Stay curious and keep exploring the world of AI technology."