In [6]:
from dotenv import load_dotenv

from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import NLTKTextSplitter, RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate

load_dotenv()
# os.environ["OPENAI_API_KEY"] = your_key_here

with open("transcript_medium.txt", encoding="utf-8") as f:
    transcript = f.read()

# transcript[:1000]

In [7]:
chat_llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", request_timeout=600)
chat_llm.get_num_tokens(transcript)  # 10,202 tokens; exceeds gpt-3.5's max 4,096 tokens

10202

In [8]:
# https://python.langchain.com/en/latest/modules/indexes/text_splitters.html

# NLTK
nltk_splitter = NLTKTextSplitter(chunk_size=15000, chunk_overlap=1000)
nltk_texts = nltk_splitter.split_text(transcript)
print(nltk_texts[0])
len(nltk_texts)

This is Atomic Soul, where we connect with the visionary mind shaping the future of the tech industry.

I'm your host Jepson Taylor and my mission is to fight for you, our listeners, as we chase the ideas that will help us stay ahead in a world of rapid innovation.

We'll journey through the realms of tech, AI, and the human brain, discovering how to live a more satisfying life in the face of relentless progress.

Let's forge a path into the future together.

Sol, is that how you pronounce your name?

It is.

It is.

And you're one of the first to get it right.

I've heard Sal, Sul, Sol.

Oh, yeah.

Well, it's very fitting because the name of this podcast is Atomic Soul.

I promise that's not why I tried to invite you onto this podcast.

But I met you in Riyadh a couple of weeks ago at LEAP.

Lots of people there.

It was it was a crazy conference.

And I think in a good way.

But I would love for the listeners to understand what has your journey been to get to where you are now.

And 

4

In [9]:
# RecursiveCharacterTextSplitter
recursive_splitter = RecursiveCharacterTextSplitter(
    separators=[
        "\n\n",
        "",  # https://github.com/hwchase17/langchain/issues/1663#issuecomment-1469161790
    ],
    chunk_size=5000,
    chunk_overlap=500,
)
recursive_texts = recursive_splitter.split_text(transcript)
print(recursive_texts[0])
len(recursive_texts)

This is Atomic Soul, where we connect with the visionary mind shaping the future of the tech industry. I'm your host Jepson Taylor and my mission is to fight for you, our listeners, as we chase the ideas that will help us stay ahead in a world of rapid innovation. We'll journey through the realms of tech, AI, and the human brain, discovering how to live a more satisfying life in the face of relentless progress. Let's forge a path into the future together. Sol, is that how you pronounce your name? It is. It is. And you're one of the first to get it right. I've heard Sal, Sul, Sol. Oh, yeah. Well, it's very fitting because the name of this podcast is Atomic Soul. I promise that's not why I tried to invite you onto this podcast. But I met you in Riyadh a couple of weeks ago at LEAP. Lots of people there. It was it was a crazy conference. And I think in a good way. But I would love for the listeners to understand what has your journey been to get to where you are now. And you're also chemi

10

Will go with NLTK text splitter as sentence structure is maintained when text is split into chunks

In [10]:
docs = nltk_splitter.create_documents([transcript])
docs[0]
num_docs = len(docs)
num_tokens_first_doc = chat_llm.get_num_tokens(docs[0].page_content)
print(
    f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)

Now we have 4 documents and the first one has 3340 tokens


Each of the 4 documents will be summarized using `map_prompt`. Think of it as four summaries.  
In this prompt, I'm trying to find a balance - a summary yet not too condensed, since a final summary will take place later.

In [11]:
map_prompt = """
The following is part of an intimate conversation between two individuals:
"{text}"
You are tasked to take notes - these will be used as teaching materials.
Document any major life events, career and achievements, beliefs and values, 
impact and legacy, personal traits and characteristics, and any other information that
would be essential when creating these lessons. 
These notes should be detailed and include relevant quotes where possible (they
have to be full sentences).
Exclude any mention of sponsors.
Do not hallucinate or make up any information.
NOTES AND QUOTES:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

The four summaries will then be summarized together into one final summary using `combine_prompt`.  
Here I explictly state to summarize.

In [12]:
combine_prompt = """
You are an expert in writing biographies and capturing the motivations, emotions, 
successes, failures, and everyday realities of individuals.
From the following text delimited by triple backquotes, write a concise summary that 
contains multiple sections. 
```{text}```
Each section is an insightful life lesson. If lessons are similar, combine the sections.
Supplement each section with quotes that best captures the essence of the lesson.
Once all sections are complete, combine sections based on lesson similarity and overlaps. 
Rank each section for its insight, depth, transformative potential and practicality.
Sections containing more details and quotes should be ranked higher.
From this ranking, select the top 10 sections and discard the rest. 
Don't include the final ranking in the summary.
Use bullet points for each section.
Use a tone that reflects empathy, authenticity, understanding, and encouragement.
Do not hallucinate or make up any information.
CONCISE SUMMARY WITH MAXIMUM 10 SECTIONS:
"""

combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)

In [13]:
custom_summary_chain = load_summarize_chain(
    llm=chat_llm,
    chain_type="map_reduce",
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False,
)

custom_output = custom_summary_chain.run(docs)
print(custom_output)

1. Sol's Journey: A Series of Squiggles
- Sol's life and career have been filled with ups and downs, and she describes it as a series of squiggles rather than a linear path to success.
- "My journey has been a series of squiggles, not a straight line."
- "Life is not a straight line, it's a series of squiggles."
- "It's okay to have a squiggly path, as long as you keep moving forward."

2. Importance of Emotional Intelligence in the Business World
- Sol emphasizes the importance of EQ (emotional intelligence) in navigating the complexities of the business world.
- "Emotional intelligence is key in the business world."
- "EQ is just as important as IQ in the business world."
- "Understanding and managing emotions is crucial for success in the business world."

3. Mapping Activities Back to Business and Creating Value
- Sol believes in mapping the work employees do back to the business and creating value.
- "Every activity should be mapped back to the business and create value."
- "Creat