In [30]:
pdf_file = "paul.pdf"
model = "llama3.2"

In [11]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(pdf_file)
pages = loader.load()

print(pages)
print(f"Number of pages: {len(pages)}")
print(f"Length of a page: {len(pages[2].page_content)}")
print(f"Content of a page: {pages[2].page_content}")


[Document(metadata={'source': 'paul.pdf', 'page': 0}, page_content='Want to start a startup? Get funded by Y Combinator .\nJuly 2013\nOne of the most common types of advice we give at Y Combinator\nis to do things that don\'t scale. A lot of would-be founders believe\nthat startups either take off or don\'t. You build something, make\nit available, and if you\'ve made a better mousetrap, people beat a\npath to your door as promised. Or they don\'t, in which case the\nmarket must not exist. [1]\nActually startups take off because the founders make them take\noff. There may be a handful that just grew by themselves, but\nusually it takes some sort of push to get them going. A good\nmetaphor would be the cranks that car engines had before they\ngot electric starters. Once the engine was going, it would keep\ngoing, but there was a separate and laborious process to get it\ngoing.\nRecruit\nThe most common unscalable thing founders have to do at the\nstart is to recruit users manually. Near

In [14]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=100
)

chunks = splitter.split_documents(pages)

print(f"Number of chunks: {len(chunks)}")
print(f"Number of chunks: {len(chunks[2].page_content)}")
print(f"Content of a chunk: {chunks[2].page_content}")

Number of chunks: 23
Number of chunks: 1459
Content of a chunk: 10% a week. And while 110 may not seem much better than 100,
if you keep growing at 10% a week you'll be surprised how big
the numbers get. After a year you'll have 14,000 users, and after
2 years you'll have 2 million.
You'll be doing different things when you're acquiring users a
thousand at a time, and growth has to slow down eventually. But
if the market exists you can usually start by recruiting users
manually and then gradually switch to less manual methods. [ 3]
Airbnb is a classic example of this technique. Marketplaces are so
hard to get rolling that you should expect to take heroic measures
at first. In Airbnb's case, these consisted of going door to door in
New York, recruiting new users and helping existing ones improve
their listings. When I remember the Airbnbs during YC, I picture
them with rolly bags, because when they showed up for tuesday
dinners they'd always just flown back from somewhere.
Fragile
Airbn

In [15]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model=model)
vectorstore = FAISS.from_documents(chunks, embeddings)

  embeddings = OllamaEmbeddings(model=model)


In [44]:
retreiver = vectorstore.as_retriever()
retreiver.invoke("What can you get away with when you have a small number of users?")

[Document(metadata={'source': 'paul.pdf', 'page': 1}, page_content='larval startups by the standards of established ones. They\'re like\nsomeone looking at a newborn baby and concluding "there\'s no\nway this tiny creature could ever accomplish anything."\nIt\'s harmless if reporters and know-it-alls dismiss your startup.\nThey always get things wrong. It\'s even ok if investors dismiss\nyour startup; they\'ll change their minds when they see growth.\nThe big danger is that you\'ll dismiss your startup yourself. I\'ve\nseen it happen. I often have to encourage founders who don\'t see\nthe full potential of what they\'re building. Even Bill Gates made\nthat mistake. He returned to Harvard for the fall semester after\nstarting Microsoft. He didn\'t stay long, but he wouldn\'t have\nreturned at all if he\'d realized Microsoft was going to be even a\nfraction of the size it turned out to be. [4]\nThe question to ask about an early stage startup is not "is this\ncompany taking over the worl

In [43]:
from langchain_ollama import ChatOllama

ollama_model = ChatOllama(model=model, temperature=0)
ollama_model.invoke("What is ANU in Canberra?")

AIMessage(content="ANU stands for Australian National University, which is a public research university located in Canberra, Australia. It is one of the country's most prestigious universities and is known for its academic excellence, research quality, and international reputation.\n\nThe Australian National University was established in 1946 as a result of the merger of three separate institutions: the Australian National Research Council, the Australian Commonwealth Scientific Organization, and the Australian School of Economics. Today, ANU is one of Australia's largest and most respected universities, with over 38,000 students from more than 130 countries.\n\nANU is particularly known for its strong programs in fields such as science, technology, engineering, mathematics (STEM), law, medicine, and the arts. The university has a strong research focus, with many of its researchers receiving international recognition for their work.\n\nSome of ANU's notable facilities include:\n\n* The

In [33]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = ollama_model | parser

print(chain.invoke("What is FIFA?"))

FIFA (Fédération Internationale de Football Association) is the international governing body of association football, also known as soccer. It was founded in 1904 and is responsible for setting the rules and regulations of the sport, organizing international tournaments, and promoting the growth of football worldwide.

FIFA's main objectives are:

1. To promote the development of football globally
2. To organize international competitions, such as the FIFA World Cup
3. To establish and enforce the laws of the game
4. To protect the rights of players, teams, and fans

Some of FIFA's key responsibilities include:

* Organizing the FIFA World Cup, which is held every four years
* Running the UEFA Champions League and other major club competitions
* Overseeing international matches between national teams
* Setting standards for stadium safety and player welfare
* Providing support for football development programs in countries around the world

FIFA also has a significant financial impact,

In [36]:
from langchain.prompts import PromptTemplate

template = """
you are an assistant that provides answers to questions based on a given context.
Answer the question based on the context. if you can't find the answer in the context, say "I don't know".
Be as concise as possible and go straight to the point.
Context: {context}
Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="This is context.", question="What is the question?"))


you are an assistant that provides answers to questions based on a given context.
Answer the question based on the context. if you can't find the answer in the context, say "I don't know".
Be as concise as possible and go straight to the point.
Context: This is context.
Question: What is the question?



In [42]:
chain = (
    prompt 
    | ollama_model 
    | parser
)

chain.invoke({
    "context": "Australia is a country.",
    "question": "Where is the capital of Australia? What's its area?"
})

'The capital of Australia is Canberra. Its area is approximately 814 square kilometers (314 sq mi).'

In [50]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retreiver,
        "question": itemgetter("question")
    }
    | prompt 
    | ollama_model 
    | parser
)

In [49]:
questions = [
    "What can you get away with when you have a small number of users?",
    "What's the most common unscalable thing founders have to do at the start?",
    "What's the one of the biggest things inexperienced founders and investors get wrong about startups?"
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print("***********************")

KeyError: 'quesiton'