In [1]:
PDF_FILE = "paul.pdf"

# We'll be using Llama 3.1 8B for this example.
MODEL = "llama3.1"

In [2]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(PDF_FILE)
pages = loader.load()

print(f"Number of pages: {len(pages)}")
print(f"Length of a page: {len(pages[1].page_content)}")
print("Content of a page:", pages[1].page_content)

Number of pages: 9
Length of a page: 3217
Content of a page: 10% a week. And while 110 may not seem much better than 100,if you keep growing at 10% a week you'll be surprised how bigthe numbers get. After a year you'll have 14,000 users, and after2 years you'll have 2 million.You'll be doing different things when you're acquiring users athousand at a time, and growth has to slow down eventually. Butif the market exists you can usually start by recruiting usersmanually and then gradually switch to less manual methods. [3]Airbnb is a classic example of this technique. Marketplaces are sohard to get rolling that you should expect to take heroic measuresat first. In Airbnb's case, these consisted of going door to door inNew York, recruiting new users and helping existing ones improvetheir listings. When I remember the Airbnbs during YC, I picturethem with rolly bags, because when they showed up for tuesdaydinners they'd always just flown back from somewhere.FragileAirbnb now seems like an 

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)

chunks = splitter.split_documents(pages)
print(f"Number of chunks: {len(chunks)}")
print(f"Length of a chunk: {len(chunks[1].page_content)}")
print("Content of a chunk:", chunks[1].page_content)


Number of chunks: 32
Length of a chunk: 1494
Content of a chunk: July 2013One of the most common types of advice we give at Y Combinatoris to do things that don't scale. A lot of would-be founders believethat startups either take off or don't. You build something, makeit available, and if you've made a better mousetrap, people beat apath to your door as promised. Or they don't, in which case themarket must not exist. [1]Actually startups take off because the founders make them takeoff. There may be a handful that just grew by themselves, butusually it takes some sort of push to get them going. A goodmetaphor would be the cranks that car engines had before theygot electric starters. Once the engine was going, it would keepgoing, but there was a separate and laborious process to get itgoing.RecruitThe most common unscalable thing founders have to do at thestart is to recruit users manually. Nearly all startups have to. Youcan't wait for users to come to you. You have to go out and getthe

In [4]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model=MODEL)
vectorstore = FAISS.from_documents(chunks, embeddings)

In [5]:
retriever = vectorstore.as_retriever()
retriever.invoke("What can you get away with when you only have a small number of users?")

[Document(metadata={'source': 'paul.pdf', 'page': 7}, page_content="started are not merely a necessary evil, but change the companypermanently for the better. If you have to be aggressive aboutuser acquisition when you're small, you'll probably still beaggressive when you're big. If you have to manufacture your ownhardware, or use your software on users's behalf, you'll learnthings you couldn't have learned otherwise. And mostimportantly, if you have to work hard to delight users when youonly have a handful of them, you'll keep doing it when you have alot."),
 Document(metadata={'source': 'paul.pdf', 'page': 0}, page_content='Want to start a startup? Get funded by Y Combinator.'),
 Document(metadata={'source': 'paul.pdf', 'page': 7}, page_content='Notes[1] Actually Emerson never mentioned mousetraps specifically. Hewrote "If a man has good corn or wood, or boards, or pigs, tosell, or can make better chairs or knives, crucibles or churchorgans, than anybody else, you will find a broad h

In [6]:
from dotenv import load_dotenv
from langchain_groq import ChatGroq

load_dotenv()

model = ChatGroq(model="llama3-70b-8192", temperature=0)
model.invoke("Who is the president of the United States?")

AIMessage(content="As of my knowledge cutoff, the President of the United States is Joe Biden. He was inaugurated as the 46th President of the United States on January 20, 2021. Please note that this information may change over time, and I'll do my best to provide an updated answer if you ask again in the future!", response_metadata={'token_usage': {'completion_tokens': 69, 'prompt_tokens': 19, 'total_tokens': 88, 'completion_time': 0.221660008, 'prompt_time': 0.000647354, 'queue_time': 0.012466966000000001, 'total_time': 0.222307362}, 'model_name': 'llama3-70b-8192', 'system_fingerprint': 'fp_87cbfbbc4d', 'finish_reason': 'stop', 'logprobs': None}, id='run-d7596a73-37ba-433c-b0ac-06e74c309243-0', usage_metadata={'input_tokens': 19, 'output_tokens': 69, 'total_tokens': 88})

In [7]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
print(chain.invoke("Who is the president of the United States?"))

As of my knowledge cutoff, the President of the United States is Joe Biden. He was inaugurated as the 46th President of the United States on January 20, 2021. Please note that this information may change over time, and I'll do my best to provide an updated answer if you ask again in the future!


In [8]:
from langchain.prompts import PromptTemplate

template = """
You are an assistant that provides answers to questions based on
a given context. 

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))


You are an assistant that provides answers to questions based on
a given context. 

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: Here is some context

Question: Here is a question



In [9]:
chain = prompt | model | parser

chain.invoke({
    "context": "Anna's sister is Susan", 
    "question": "Who is Susan's sister?"
})


'Anna.'

In [10]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

In [11]:
questions = [
    "What can you get away with when you only have a small number of users?",
    "What's the most common unscalable thing founders have to do at the start?",
    "What's one of the biggest things inexperienced founders and investors get wrong about startups?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print("*************************\n")

Question: What can you get away with when you only have a small number of users?
Answer: You can get away with working hard to delight users when you only have a handful of them.
*************************

Question: What's the most common unscalable thing founders have to do at the start?
Answer: Sales.
*************************

Question: What's one of the biggest things inexperienced founders and investors get wrong about startups?
Answer: They think they can avoid doing sales by hiring someone to do it for them.
*************************

