In [1]:
from dotenv import load_dotenv

from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import NLTKTextSplitter, RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.prompts import PromptTemplate

load_dotenv()
# os.environ["OPENAI_API_KEY"] = your_key_here

with open("transcript_medium.txt", encoding="utf-8") as f:
    transcript = f.read()

# transcript[:1000]

In [2]:
chat_llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", request_timeout=600)
chat_llm.get_num_tokens(transcript)  # 10,202 tokens; exceeds gpt-3.5's max 4,096 tokens

10202

In [3]:
# https://python.langchain.com/en/latest/modules/indexes/text_splitters.html

# NLTK
nltk_splitter = NLTKTextSplitter(chunk_size=15000, chunk_overlap=1000)
nltk_texts = nltk_splitter.split_text(transcript)
print(nltk_texts[0])
len(nltk_texts)

# RecursiveCharacterTextSplitter
recursive_splitter = RecursiveCharacterTextSplitter(
    separators=[
        "\n\n",
        "",  # https://github.com/hwchase17/langchain/issues/1663#issuecomment-1469161790
    ],
    chunk_size=5000,
    chunk_overlap=500,
)
recursive_texts = recursive_splitter.split_text(transcript)
print(recursive_texts[0])
len(recursive_texts)

# will go with NLTK as sentence structure is maintained when text is split into chunks

This is Atomic Soul, where we connect with the visionary mind shaping the future of the tech industry.

I'm your host Jepson Taylor and my mission is to fight for you, our listeners, as we chase the ideas that will help us stay ahead in a world of rapid innovation.

We'll journey through the realms of tech, AI, and the human brain, discovering how to live a more satisfying life in the face of relentless progress.

Let's forge a path into the future together.

Sol, is that how you pronounce your name?

It is.

It is.

And you're one of the first to get it right.

I've heard Sal, Sul, Sol.

Oh, yeah.

Well, it's very fitting because the name of this podcast is Atomic Soul.

I promise that's not why I tried to invite you onto this podcast.

But I met you in Riyadh a couple of weeks ago at LEAP.

Lots of people there.

It was it was a crazy conference.

And I think in a good way.

But I would love for the listeners to understand what has your journey been to get to where you are now.

And 

10

In [4]:
docs = nltk_splitter.create_documents([transcript])
docs[0]
num_docs = len(docs)
num_tokens_first_doc = chat_llm.get_num_tokens(docs[0].page_content)
print(
    f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)

Now we have 4 documents and the first one has 3340 tokens


In [5]:
map_prompt = """
The following is part of an intimate conversation between two individuals:
"{text}"
You are tasked to take notes - these will be used as teaching materials.
Document any major life events, career and achievements, beliefs and values, 
impact and legacy, personal traits and characteristics, and any other information that
would be essential when creating these lessons. 
These notes should be detailed and include relevant quotes where possible (they
have to be full sentences).
Exclude any mention of sponsors.
Do not hallucinate or make up any information.
NOTES AND QUOTES:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = """
You are an expert in writing biographies and capturing the motivations, emotions, 
successes, failures, and everyday realities of individuals.
From the following text delimited by triple backquotes, write a concise summary that 
contains multiple sections. 
```{text}```
Each section is an insightful life lesson. If lessons are similar, combine the sections.
Supplement each section with quotes that best captures the essence of the lesson.
Once all sections are complete, combine sections based on lesson similarity and overlaps. 
Rank each section for its insight, depth, transformative potential and practicality.
Sections containing more details and quotes should be ranked higher.
From this ranking, select the top 10 sections and discard the rest. 
Don't include the final ranking in the summary.
Use bullet points for each section.
Use a tone that reflects empathy, authenticity, understanding, and encouragement.
Do not hallucinate or make up any information.
CONCISE SUMMARY WITH MAXIMUM 10 SECTIONS:
"""

combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)

custom_summary_chain = load_summarize_chain(
    llm=chat_llm,
    chain_type="map_reduce",
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False,
)

custom_output = custom_summary_chain.run(docs)
print(custom_output)

1. Resourcefulness and Problem-Solving
- "Every problem has a solution."
- "I realized that I was good at fixing problems."
- "They became a problem solver and has a knack for finding solutions to problems."

2. Discipline and Rigor
- "Values discipline and rigor."
- "Developed discipline and rigor to compensate for lack of natural smarts and charisma."

3. Craving for Achievement
- "Has a craving to do it all and are constantly pushing themselves to achieve more."
- "The individual values achievement and loves the feeling of seeing others have an 'aha' moment."

4. Understanding Business Language
- "Understanding the business language is probably the most important thing that no one teaches you."
- "You must understand the business language and if you want to have a seat at the table."

5. Resilience and Hustle
- "Values resilience, hustle, and internal strength."
- "The individual is confident and secure."
- "The individual is resilient and hustles to achieve her goals."

6. Empoweri

In [6]:
num_docs

4

In [7]:
# https://python.langchain.com/en/latest/modules/indexes/text_splitters.html

# NLTK
nltk_splitter = NLTKTextSplitter(chunk_size=10000, chunk_overlap=1000)
nltk_texts = nltk_splitter.split_text(transcript)
print(nltk_texts[0])
len(nltk_texts)

# RecursiveCharacterTextSplitter
recursive_splitter = RecursiveCharacterTextSplitter(
    separators=[
        "\n\n",
        "",  # https://github.com/hwchase17/langchain/issues/1663#issuecomment-1469161790
    ],
    chunk_size=5000,
    chunk_overlap=500,
)
recursive_texts = recursive_splitter.split_text(transcript)
print(recursive_texts[0])
len(recursive_texts)

# will go with NLTK as sentence structure is maintained when text is split into chunks

This is Atomic Soul, where we connect with the visionary mind shaping the future of the tech industry.

I'm your host Jepson Taylor and my mission is to fight for you, our listeners, as we chase the ideas that will help us stay ahead in a world of rapid innovation.

We'll journey through the realms of tech, AI, and the human brain, discovering how to live a more satisfying life in the face of relentless progress.

Let's forge a path into the future together.

Sol, is that how you pronounce your name?

It is.

It is.

And you're one of the first to get it right.

I've heard Sal, Sul, Sol.

Oh, yeah.

Well, it's very fitting because the name of this podcast is Atomic Soul.

I promise that's not why I tried to invite you onto this podcast.

But I met you in Riyadh a couple of weeks ago at LEAP.

Lots of people there.

It was it was a crazy conference.

And I think in a good way.

But I would love for the listeners to understand what has your journey been to get to where you are now.

And 

10

In [8]:
docs = nltk_splitter.create_documents([transcript])
docs[0]
num_docs = len(docs)
num_tokens_first_doc = chat_llm.get_num_tokens(docs[0].page_content)
print(
    f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)

Now we have 5 documents and the first one has 2234 tokens


In [9]:
# https://python.langchain.com/en/latest/modules/indexes/text_splitters.html

# NLTK
nltk_splitter = NLTKTextSplitter(chunk_size=10000, chunk_overlap=2000)
nltk_texts = nltk_splitter.split_text(transcript)
print(nltk_texts[0])
len(nltk_texts)

# RecursiveCharacterTextSplitter
recursive_splitter = RecursiveCharacterTextSplitter(
    separators=[
        "\n\n",
        "",  # https://github.com/hwchase17/langchain/issues/1663#issuecomment-1469161790
    ],
    chunk_size=5000,
    chunk_overlap=500,
)
recursive_texts = recursive_splitter.split_text(transcript)
print(recursive_texts[0])
len(recursive_texts)

# will go with NLTK as sentence structure is maintained when text is split into chunks

This is Atomic Soul, where we connect with the visionary mind shaping the future of the tech industry.

I'm your host Jepson Taylor and my mission is to fight for you, our listeners, as we chase the ideas that will help us stay ahead in a world of rapid innovation.

We'll journey through the realms of tech, AI, and the human brain, discovering how to live a more satisfying life in the face of relentless progress.

Let's forge a path into the future together.

Sol, is that how you pronounce your name?

It is.

It is.

And you're one of the first to get it right.

I've heard Sal, Sul, Sol.

Oh, yeah.

Well, it's very fitting because the name of this podcast is Atomic Soul.

I promise that's not why I tried to invite you onto this podcast.

But I met you in Riyadh a couple of weeks ago at LEAP.

Lots of people there.

It was it was a crazy conference.

And I think in a good way.

But I would love for the listeners to understand what has your journey been to get to where you are now.

And 

10

In [10]:
docs = nltk_splitter.create_documents([transcript])
docs[0]
num_docs = len(docs)
num_tokens_first_doc = chat_llm.get_num_tokens(docs[0].page_content)
print(
    f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)

Now we have 6 documents and the first one has 2234 tokens


In [11]:
map_prompt = """
The following is part of an intimate conversation between two individuals:
"{text}"
You are tasked to take notes - these will be used as teaching materials.
Document any major life events, career and achievements, beliefs and values, 
impact and legacy, personal traits and characteristics, and any other information that
would be essential when creating these lessons. 
These notes should be detailed and include relevant quotes where possible (they
have to be full sentences).
Exclude any mention of sponsors.
Do not hallucinate or make up any information.
NOTES AND QUOTES:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

combine_prompt = """
You are an expert in writing biographies and capturing the motivations, emotions, 
successes, failures, and everyday realities of individuals.
From the following text delimited by triple backquotes, write a concise summary that 
contains multiple sections. 
```{text}```
Each section is an insightful life lesson. If lessons are similar, combine the sections.
Supplement each section with quotes that best captures the essence of the lesson.
Once all sections are complete, combine sections based on lesson similarity and overlaps. 
Rank each section for its insight, depth, transformative potential and practicality.
Sections with more details and quotes should be ranked higher.
From this ranking, select the top 10 sections and discard the rest. 
Don't include the final ranking in the summary.
Use bullet points for each section.
Use a tone that reflects empathy, authenticity, understanding, and encouragement.
Do not hallucinate or make up any information.
CONCISE SUMMARY WITH MAXIMUM 10 SECTIONS:
"""

combine_prompt_template = PromptTemplate(
    template=combine_prompt, input_variables=["text"]
)

custom_summary_chain = load_summarize_chain(
    llm=chat_llm,
    chain_type="map_reduce",
    map_prompt=map_prompt_template,
    combine_prompt=combine_prompt_template,
    verbose=False,
)

custom_output = custom_summary_chain.run(docs)
print(custom_output)

1. Overcoming Insecurities and Finding Confidence
- Soul struggled with confidence in her earlier years but found her niche in problem-solving and gained confidence through discipline and rigor in her studies.
- "I just had the discipline and I couldn't make up for it with natural smarts and charisma."
2. Embracing Failure and Learning from Mistakes
- Soul celebrates mistakes and failures as opportunities to learn and grow.
- "Some people think the road to success is linear and it just goes up and forward. When in actuality, it's just a bunch of squiggles and you have a bunch of dips and peaks and valleys and all of the above."
3. Balancing Priorities and Accepting Limitations
- Soul believes in prioritizing mental and physical health and balancing priorities while accepting that not everything can have equal priority.
- "There's just so much to do and I have a craving to do it all."
4. Empowering Team Members and Building Relationships
- Soul believes in empowering team members to sol