<a href="https://colab.research.google.com/github/SandeshLekhwani/GPTPoweredChatbot/blob/colab/Widur_Chatbot!.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the Lenny Chatbot Colab!

This Colab notebook contains all of the code you need to make a basic chatbot that will answer questions about a corpus of text. Colab is a cloud-based programming environment which will let you run all of this code from your browser.

At each step, follow the written instructions and press the "play" button next to the code sample in order to run it.

**Important Note:** This is a basic chatbot running on a limited selection of articles. It's only a starting point to show you what's possible!

If you have questions, feel free to reach out to me on Twitter at [@danshipper](https://www.twitter.com/danshipper).

## 1. Download our text corpus

The first thing we need to do is download the text our chatbot is going to use as reference material for answering questions.

In the Lenny Chatbot, I used every article he's written as the text corpus. But for this public codebase, I've collected two articles from his archive that we can use as a starting point.

These are the articles I'm using:

- [What is good retention?](https://www.lennysnewsletter.com/p/what-is-good-retention-issue-29)
- [How the biggest consumer apps got their first 1,000 users
](https://www.lennysnewsletter.com/p/how-the-biggest-consumer-apps-got)

You can replace these articles with any text corpus you want, however.


In [None]:
! git clone https://github.com/EveryInc/Lenny-Newsletter-Corpus

Cloning into 'Lenny-Newsletter-Corpus'...
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (2/2), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 9 (delta 1), reused 0 (delta 0), pack-reused 7[K
Receiving objects: 100% (9/9), 43.33 KiB | 504.00 KiB/s, done.
Resolving deltas: 100% (2/2), done.


# 2. Install our dependencies and define our functions

In this section we'll install GPT Index and Langchain. We'll also define the functions that we'll use later to construct our index and query it.

First, let's install our dependencies.

In [2]:
!pip install gpt_index
!pip install langchain
!pip install llama-index
!pip install pypdf

Collecting gpt_index
  Downloading gpt_index-0.8.10-py3-none-any.whl (706 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m706.5/706.5 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken (from gpt_index)
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json (from gpt_index)
  Downloading dataclasses_json-0.5.14-py3-none-any.whl (26 kB)
Collecting langchain>=0.0.262 (from gpt_index)
  Downloading langchain-0.0.273-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=0.26.4 (from gpt_index)
  Downloading openai-0.27.9-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.5/75.5 kB[0m [31m9.1 MB/s[0m eta [36m

Now, we'll define the functions we're going to use later in order to construct our index and query it.

In [3]:
pip install --upgrade llama_index langchain



In [16]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTVectorStoreIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
import pypdf
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 256
    # set maximum chunk overlap
    max_chunk_overlap = 0.1
    # set chunk size limit
    chunk_size_limit = 600
    api_key = os.environ["OPENAI_API_KEY"]
    # define LLM
    #llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_outputs))
    llm_predictor = LLMPredictor(llm=OpenAI(api_key=api_key, temperature=0, model_name="gpt-3.5-turbo", streaming=True))
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
    #rompt_helper = PromptHelper(max_input_size, num_outputs, chunk_overlap_ratio= 0.1, chunk_size_limit=1024)
    documents = SimpleDirectoryReader(directory_path).load_data()

    index = GPTVectorStoreIndex(
        documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
    )

    #index.save_to_disk('index.json')
    index.storage_context.persist(persist_dir="data")
    return index

def ask_lenny():
    documents = SimpleDirectoryReader('data').load_data()
    index = GPTVectorStoreIndex.from_documents(documents)

    while True:
        query = input("What do you want to ask Lenny? ")
        query_engine = index.as_query_engine()
        response = query_engine.query(query)
        display(Markdown(f"Lenny Bot says: <b>{response.response}</b>"))


# 3. Set OpenAI API Key
In order to run this notebook you'll need an API key from OpenAI.

If you don't have one already, you can grab one by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste it into the text input.



In [13]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI API key here and hit enter:")


Paste your OpenAI API key here and hit enter:


# 4. Construct Index

Now we're going to construct our index. This will take every file in the folder 'Lenny-Newsletter-Corpus', split it into chunks, and embed it with OpenAI's embeddings API.

**Important Note:** This step costs money. Running it on the text corpus we've given you by default should only cost $0.03 in total. But if you use other pieces of text be careful if they're really long.


In [14]:
construct_index('/content/Widur_Archives')



<llama_index.indices.vector_store.base.VectorStoreIndex at 0x7d8116c67ac0>

# 5. Ask Questions!

Now we'll run the "ask_lenny" function we defined above.

This will prompt the you to input a question, and then it will find chunks of text that might answer the question, and summarize the answer from those text chunks using GPT-3.

Remember, in this public Colab file we're only using two of Lenny's articles for our corpus. So it will only answer questions from:

- [What is good retention?](https://www.lennysnewsletter.com/p/what-is-good-retention-issue-29)
- [How the biggest consumer apps got their first 1,000 users
](https://www.lennysnewsletter.com/p/how-the-biggest-consumer-apps-got)


A few sample questions you can ask:

- What is good retention for a consumer social product?

- How did DoorDash get its first users?

- How did LinkedIn get started?

Again, this step costs money. So be aware!

In [None]:
ask_lenny()

What do you want to ask Lenny? what to do if i am feeling sleepless


Lenny Bot says: <b>Engage in activities that promote relaxation and help you fall asleep. This can include practicing good sleep hygiene, such as maintaining a consistent sleep schedule, creating a comfortable sleep environment, avoiding stimulating activities before bed, and practicing relaxation techniques like deep breathing or meditation. It may also be helpful to limit exposure to electronic devices and avoid consuming caffeine or alcohol close to bedtime. If sleeplessness persists, it may be beneficial to consult with a healthcare professional for further guidance.</b>

What do you want to ask Lenny? Tell me what the remedy is, for one who is sleepless and burning


Lenny Bot says: <b>The remedy for someone who is sleepless and burning is not mentioned in the given context.</b>

What do you want to ask Lenny? who is a pandit


Lenny Bot says: <b>A pandit is someone who regularly performs praiseworthy deeds and rejects censurable actions. They have a firm conviction in the teachings of the Veda and are generous. A pandit is not swayed by anger, joy, pride, false modesty, stupefaction, or vanity, and remains focused on the four legitimate goals of life. They keep their projects and recommendations concealed from competitors and only reveal them after they have been accomplished. A pandit reflects daily on both virtue and success, prioritizing success over mere pleasure. They are strongly motivated, act to the best of their ability, and consider nothing as insignificant. A pandit understands quickly, listens patiently, pursues objectives with considerate planning, and does not waste time discussing the affairs of others without being asked. They know the impermanent nature of all creatures, the inter-connectedness of all acts, and the methodology for people to attain their objectives. A pandit speaks boldly, can converse on various subjects, knows the science of debate, possesses intellectual acuity, and can interpret the meaning of texts. Their studies are regulated by reason and they never abstain from paying respect to those who are good. A pandit draws wisdom from those who are wise and succeeds in achieving happiness by attending to both virtue and profit. Finally, a pandit, even after attaining immense wealth or vast learning, does not behave arrogantly.</b>

What do you want to ask Lenny? i am confused whether i should do a startup or stay in my current job


Lenny Bot says: <b>One should consider various factors before making a decision between starting a new business or staying in their current job. Factors such as personal goals, financial stability, risk tolerance, passion for entrepreneurship, and market opportunities should be taken into account. It is important to carefully evaluate the potential benefits and challenges of both options before making a decision. Seeking advice from mentors or professionals in the field can also provide valuable insights. Ultimately, the decision should be based on individual circumstances and aspirations.</b>