- load text from certain webpage
- perfrom page summary
- use output decoder
- use gradio (used to generate a chatbot interface)

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [24]:
from langchain.document_loaders import UnstructuredURLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

In [5]:
# load data from url
url = "https://sports.yahoo.com/why-the-nets-felt-the-need-to-move-on-from-jacque-vaughn-223103428.html"
loader = UnstructuredURLLoader([url])
data = loader.load() # directly load

In [7]:
print(data)

[Document(page_content="Yahoo Sports\n\nWhy the Nets felt the need to move on from Jacque Vaughn\n\nJake Fischer\n\nSenior NBA reporter\n\n5 min read\n\nLink Copied\n\nTo understand why Jacque Vaughn was fired as the Brooklyn Nets’ head coach Monday morning, start by recalling how and why Vaughn rose to the first chair on Brooklyn’s bench back in November 2022.\n\nBarring strokes of fortune, in a business that offers little luck to precious few, Vaughn’s path always seemed to represent a stop-gap solution for Brooklyn’s front office. After dismissing Steve Nash two weeks into last season with Kevin Durant’s summer trade request still echoing off the Barclays Center rafters, the Nets ultimately pivoted to Vaughn instead of their prioritized candidate, then-embattled Ime Udoka, as Nash’s replacement — all while the franchise managed Kyrie Irving’s prolonged controversy stemming from his promotion of an antisemitic film. Vaughn dutifully served as Brooklyn's interim coach when the Nets pa

In [9]:
text_splitter = RecursiveCharacterTextSplitter(
    separators=["Link Copied"], 
    chunk_size=1000,
    chunk_overlap=20,
    length_function=len
    )
# load and split 
data = loader.load_and_split(text_splitter=text_splitter)

In [13]:
# we need 2nd element after the split
doc = data[1]
doc

Document(page_content="Link Copied\n\nTo understand why Jacque Vaughn was fired as the Brooklyn Nets’ head coach Monday morning, start by recalling how and why Vaughn rose to the first chair on Brooklyn’s bench back in November 2022.\n\nBarring strokes of fortune, in a business that offers little luck to precious few, Vaughn’s path always seemed to represent a stop-gap solution for Brooklyn’s front office. After dismissing Steve Nash two weeks into last season with Kevin Durant’s summer trade request still echoing off the Barclays Center rafters, the Nets ultimately pivoted to Vaughn instead of their prioritized candidate, then-embattled Ime Udoka, as Nash’s replacement — all while the franchise managed Kyrie Irving’s prolonged controversy stemming from his promotion of an antisemitic film. Vaughn dutifully served as Brooklyn's interim coach when the Nets parted ways with Kenny Atkinson in May 2020, and he did so again in the aftermath of Nash’s ouster, helping the Nets rebound from a 

In [21]:
# summarize text
# chain type
#   stuff - combine all segments into one large doc which is passed to LLM
#         - when doc is too large, it cannot be passed to LLM
#         - default
#   map reduce - pass each segment to LLM then combine summaries and passed to LLM again
#   refine - take one doc as a start point and passed to LLM. Then, take the intermediate answer
#             with next doc to pass to LLM to improve the answer
llm = OpenAI(max_tokens=1500)
chain = load_summarize_chain(
    llm=llm)

In [22]:
summary = chain.run([doc])
print(summary)



Jacque Vaughn was fired as the Brooklyn Nets' head coach due to the team's underperformance and the front office's desire for a more experienced coach. Despite initial success, Vaughn's lack of playoff experience and inability to lead the team to a higher seed led to his dismissal. Kevin Ollie is expected to take over as interim coach, but the long-term coaching search will be a crucial decision for the franchise. There is speculation about the future of general manager Sean Marks, but he remains well-regarded by the team's owners. Ultimately, the Nets are looking for a coach with postseason experience to guide the team to success.


In [23]:
# we could also perform the translate in the function by modifying the default prompt
prompt_template = """总结这段新闻的内容

"{text}"

总结:"""

transplate_prompt = PromptTemplate(
    template=prompt_template, input_variables=['text'])

translate_chain = load_summarize_chain(
    llm=llm, prompt = transplate_prompt)

translate_chain.run([doc])

' 杰克·沃恩作为布鲁克林篮网队主教练被解雇的原因是因为球队在本赛季表现不佳。球队本来希望沃恩能够带领球队重返季后赛，但在赛季中期，球队的成绩开始下滑，最终无缘季后赛。球队管理层决定解雇沃恩，由前NBA球员凯文·奥利接任暂时教练。球队希望能够找到一位有着丰富季后赛经验的教练来指导球队，以帮助球队重返争冠的道路。球队管理层也将继续寻找最合适的教练来带领球队。'

In [38]:
# next we want to make this more like a chat between 2 people

openaichat = ChatOpenAI(model_name="gpt-3.5-turbo")

template = """
I will provide a news summary. Please follow the instruction to rewrite the summary into conversation between two NBA fans.

news summary: "{summary}"

instruction: "{instruction}"
"""

chat_prompt = PromptTemplate.from_template(template=template)

instruction = "characters are Bob and Peter, start by greeting each other, and then make a conversation about the news"

msg = [
    HumanMessage(content=chat_prompt.format(summary=summary, instruction=instruction))
]

res= openaichat(msg)

print(res)

content="Bob: Hey Peter, did you hear about Jacque Vaughn getting fired as the Nets' head coach?\n\nPeter: Yeah, I can't say I'm surprised. The team's underperformance was definitely disappointing.\n\nBob: I agree. It's a tough decision for the front office, but they want a coach with more experience to lead the team.\n\nPeter: Kevin Ollie as interim coach is an interesting choice, but they need to make a solid long-term decision.\n\nBob: Definitely. The Nets need someone with playoff experience to take them to the next level. It'll be interesting to see who they choose.\n\nPeter: I'm also curious about the future of general manager Sean Marks. Do you think he'll stay with the team?\n\nBob: I think he's well-regarded by the owners, so he might stick around. But ultimately, they need a strong coaching staff to guide the team to success.\n\nPeter: Yeah, it'll be a crucial decision for the franchise. Let's see how it all plays out."


In [39]:
# print better
print(res.content)

Bob: Hey Peter, did you hear about Jacque Vaughn getting fired as the Nets' head coach?

Peter: Yeah, I can't say I'm surprised. The team's underperformance was definitely disappointing.

Bob: I agree. It's a tough decision for the front office, but they want a coach with more experience to lead the team.

Peter: Kevin Ollie as interim coach is an interesting choice, but they need to make a solid long-term decision.

Bob: Definitely. The Nets need someone with playoff experience to take them to the next level. It'll be interesting to see who they choose.

Peter: I'm also curious about the future of general manager Sean Marks. Do you think he'll stay with the team?

Bob: I think he's well-regarded by the owners, so he might stick around. But ultimately, they need a strong coaching staff to guide the team to success.

Peter: Yeah, it'll be a crucial decision for the franchise. Let's see how it all plays out.


### Parser

In [40]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

In [41]:
class Line(BaseModel):
    character: str = Field(description="person name")
    content: str = Field(description="detailed chat message excluding person name")

class Conversation(BaseModel):
    script: list[Line] = Field(description="a conversation")

In [None]:
parser = PydanticOutputParser(pydantic_object=Conversation)

In [None]:
template = """
I will provide a news summary. Please follow the instruction to rewrite the summary into conversation between two NBA fans.

news summary: "{summary}"

instruction: "{instruction}"

{output_format}
"""

chat_prompt = PromptTemplate.from_template(
    template=template, 
    input_variable = ["summary", "instruction"],
    partial_variables={"output_format": parser.get_format_instructions()}
    )