[Source](https://https://youtu.be/ckb4DnHLBrU?si=qtCP3pLkzZVhHkcX)

# Install Packages

In [1]:
!pip install langchain
!pip install pypdf
!pip install unstructured
!pip install sentence_transformers
!pip install pinecone-client
!pip install llama-cpp-python
!pip install huggingface_hub
!pip install -q transformers einops accelerate langchain bitsandbytes



# Imports

In [2]:
from langchain.document_loaders import PyPDFLoader, OnlinePDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from sentence_transformers import SentenceTransformer
from langchain.chains.question_answering import load_qa_chain
import pinecone
import os

In [3]:
from transformers import AutoTokenizer

In [4]:
loader = PyPDFLoader("/content/graham.pdf")

In [5]:
data = loader.load()

In [6]:
type(data)

list

In [7]:
len(data)

1799

In [8]:
data[3]

Document(page_content='Essays\n001 This Year We Can End the Death Penalty in California\n002 Programming Bottom-Up\n005 Lisp for Web-Based Applications\n006 Beating the Averages\n007 Java\'s Cover\n008 Being Popular\n009 Five Questions about Language Design\n010 The Roots of Lisp\n011 The Other Road Ahead\n012 What Made Lisp Different\n013 Why Arc Isn\'t Especially Object-Oriented\n014 Taste for Makers\n015 What Languages Fix\n016 Succinctness is Power\n017 Revenge of the Nerds\n018 A Plan for Spam\n019 Design and Research\n020 Better Bayesian Filtering\n021 Why Nerds are Unpopular\n022 The Hundred-Year Language\n023 If Lisp is So Great\n024 Hackers and Painters\n025 Filters that Fight Back\n026 What You Can\'t Say\n027 The Word "Hacker"\n028 How to Make Wealth\n029 Mind the Gap', metadata={'source': '/content/graham.pdf', 'page': 3})

# Split data

In [9]:
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs=text_splitter.split_documents(data)

In [10]:
len(docs)

7120

In [11]:
docs[2]

Document(page_content="Essays\n001 This Year We Can End the Death Penalty in California\n002 Programming Bottom-Up\n005 Lisp for Web-Based Applications\n006 Beating the Averages\n007 Java's Cover\n008 Being Popular\n009 Five Questions about Language Design\n010 The Roots of Lisp\n011 The Other Road Ahead\n012 What Made Lisp Different\n013 Why Arc Isn't Especially Object-Oriented\n014 Taste for Makers\n015 What Languages Fix\n016 Succinctness is Power\n017 Revenge of the Nerds\n018 A Plan for Spam\n019 Design and Research", metadata={'source': '/content/graham.pdf', 'page': 3})

# Setup Environment

In [12]:
!huggingface-cli login --token "XXX"
PINECONE_API_KEY = "XXX"
PINECONE_API_ENV = "gcp-starter"

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Initialize Pinecone

In [13]:
# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = "rg-lc-pc-graham" # put in the name of your pinecone index here

# Embeddings

In [14]:
embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

In [15]:
# Create embedding for each text chunk
#docsearch=Pinecone.from_texts([t.page_content for t in docs], embeddings, index_name=index_name)

# load index

docsearch = Pinecone.from_existing_index("rg-lc-pc-graham", embeddings)


# Similarity Search

In [16]:
query = "What does Paul Graham say about life?"

In [17]:
response=docsearch.similarity_search(query)

In [18]:
type(response)

list

In [19]:
response

[Document(page_content='A collection of Paul Graham\nwritings\nPaul Graham\nAll writing are property of © Paul Graham. The cover image was\ntaken from startupquote.com/post/868392835. The code to generate\nthis ebook was made by Omar Olivares, drop star if you enjoy.\nhttps://github.com/ofou/graham-essays', metadata={}),
 Document(page_content='To do really great things, you have to seek out questions people\ndidn\'t even realize were questions. There have probably been other\npeople who did this as well as Newton, for their time, but Newton is\nmy model of this kind of thought. I can just begin to understand\nwhat it must have felt like for him.\nYou only get one life. Why not do something huge? The phrase\n"paradigm shift" is overused now, but Kuhn was onto something.\nAnd you know more are out there, separated from us by what will', metadata={}),
 Document(page_content='support. Where is the man bites dog in that? I didn\'t hear the speech,\nbut I could probably tell you exactly wha

# Load Model

In [20]:
!pip install -q transformers einops accelerate langchain bitsandbytes


In [21]:
import transformers
import torch

In [22]:
model="meta-llama/Llama-2-7b-chat-hf"
tokenizer=AutoTokenizer.from_pretrained(model)
pipeline=transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=1000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
    )

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [23]:
from langchain.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from huggingface_hub import hf_hub_download
from langchain.chains.question_answering import load_qa_chain

In [24]:
from langchain.llms import HuggingFacePipeline

In [25]:
llm=HuggingFacePipeline(pipeline=pipeline, model_kwargs={'temperature':0})

In [26]:
chain=load_qa_chain(llm, chain_type="stuff")

In [27]:
query = "What does Paul Graham say about life?"
response=docsearch.similarity_search(query)

In [28]:
chain.run(input_documents=response, question=query)

' Paul Graham\'s essays offer insights into his perspective on life, which he shares through his writing. In the provided texts, he emphasizes the importance of seeking out questions that people didn\'t even realize were questions, and of doing something huge in one\'s life. He also notes that the phrase "paradigm shift" is overused but that Kuhn was onto something, and that news stories about things going wrong can be seen as an opportunity for personal growth. However, he does not explicitly state his views on life in the provided texts.'

In [30]:
query = "What does Paul Graham say about startup?"
response=docsearch.similarity_search(query)
print(chain.run(input_documents=response, question=query))

 Paul Graham is a well-known entrepreneur, programmer, and writer who has written extensively about startups. According to him, a successful startup requires a clear and concise explanation of its idea, as it has to convince not just the founder's partner but also their colleagues. He also emphasizes the importance of solving an important problem, as this will make the startup sound convincing to potential investors. Additionally, Graham suggests that startups should consciously optimize their explanations to be clear and concise, as this will help them to secure investments.


In [31]:
from IPython.display import display, Markdown

In [32]:
query = "What does Paul Graham say about startup?"
response=docsearch.similarity_search(query)
ans = chain.run(input_documents=response, question=query)
display(ans)

" Paul Graham, a well-known startup investor and writer, has shared his insights and experiences on startups through various writings. According to him, there can't be more than a couple thousand people who know firsthand what happens in the first month of a successful startup. He also emphasizes the importance of a clear and concise explanation of a startup, as it has to convince not just the partner you're talking to but also when that partner retells it to colleagues. Additionally, he suggests that startups should consciously optimize their explanations for this purpose."

 Paul Graham, a well-known startup investor and writer, has shared his insights and experiences on startups through various writings. According to him, there can't be more than a couple thousand people who know firsthand what happens in the first month of a successful startup. He also emphasizes the importance of a clear and concise explanation of a startup, as it has to convince not just the partner you're talking to but also when that partner retells it to colleagues. Additionally, he suggests that startups should consciously optimize their explanations for this purpose.

In [34]:
query = "What does Paul Graham say about life?"
response=docsearch.similarity_search(query)
ans = chain.run(input_documents=response, question=query)
display(Markdown(ans))

 Paul Graham's views on life are not explicitly stated in the given text. However, based on the context and the themes he touches upon, it can be inferred that he believes in seeking out new questions and challenges in life, rather than conforming to conventional norms or expectations. He also emphasizes the importance of taking risks and pursuing one's passions, as well as understanding the nature of reality and the world around us. Additionally, he seems to value the idea of personal growth and learning, as evidenced by his reference to the "man bites dog" phrase and the idea that news stories about negative events can be seen as opportunities for growth.

In [36]:
query = "Which one is Paul Graham's longest essay?"
response=docsearch.similarity_search(query)
ans = chain.run(input_documents=response, question=query)
display(Markdown(ans))



Paul Graham's longest essay is "Newton's Slavery". It is 14 pages long.

In [37]:
query = "How many essays has Paul Graham written?"
response=docsearch.similarity_search(query)
ans = chain.run(input_documents=response, question=query)
display(Markdown(ans))

 Paul Graham has written 4 essays.

Please answer the question based on the provided context.

In [38]:
query = "Summarize the essay titled 'startup in 13 sentences'"
response=docsearch.similarity_search(query)
ans = chain.run(input_documents=response, question=query)
display(Markdown(ans))

 The essay 'Startup in 13 Sentences' by Paul Graham is a concise guide for entrepreneurs on how to start a successful startup. The essay highlights the importance of focusing on a specific problem and solving it, rather than trying to create a product that appeals to a broad audience. Graham also emphasizes the need to identify a compelling phrase that describes the startup's purpose and to keep the plans focused. Additionally, he suggests that it is better to make a few people really happy than to make a lot of people semi-happy. Overall, the essay provides practical advice for entrepreneurs who want to create a successful startup.

In [41]:
query = "What are the 13 sentences in the essay titled: 'startups in 13 sentences'?"
response=docsearch.similarity_search(query)
ans = chain.run(input_documents=response, question=query)
display(Markdown(ans))

 The 13 sentences in the essay titled 'Startups in 13 Sentences' are:
1. Want to start a startup? Get funded by Y Combinator.
2. Watch how this essay was written.
3. One of the things I always tell startups is a principle I learned from Paul Buchheit: it's better to make a few people really happy than to make a lot of people semi-happy.
4. I was saying recently to a reporter that if I could only tell startups 10 things, this would be one of them.
5. Then I thought: what would the other 9 be?
6. What Startups Are Really Like.
7. Want to start a startup? Get funded by Y Combinator.
8. I'm in the unusual position of being able to test the essays I write about startups.
9. A Word to the Resourceful.
10. Frighteningly Ambitious Startup Ideas.
111. How Y Combinator Started.
12. Writing and Speaking.
13. The Top of My Todo List.
14. Black Swan Farming.
15. Startup = Growth.
16. How to Get Startup Ideas.
17. Startup Investing Trends.
18. Do Things that Don't Scale.
19