In [1]:
!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.139 --progress-bar off
!pip install -qqq openai==0.27.4 --progress-bar off
!pip install -Uqqq watermark==2.3.1 --progress-bar off
!pip install -Uqqq chromadb==0.3.21 --progress-bar off
!pip install -Uqqq tiktoken==0.3.3 --progress-bar off

In [13]:
# This line of code is a magic command in IPython that loads the 'watermark' extension.
# The 'watermark' extension is a useful tool for documenting used Python modules, 
# their versions, and the machine characteristics. This can be helpful for reproducibility of computational environments.

%load_ext watermark

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


In [14]:
# Import the necessary libraies
import os
import openai

import textwrap
import chromadb
import langchain
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma

In [7]:
# Set the API key by reading the folder path. Use this code if you're running the code on Google Colab. Otherwise, use the actual folder path
folder_path = '/Users/jasper/Desktop/LangChain/'

# Folder path
os.chdir(folder_path)

In [10]:
# Read the text file containing the API key
with open(folder_path + "Jasper_OpenAI_API_Key.txt", "r") as f:
  openai.api_key = ' '.join(f.readlines())

In [11]:
# Update the OpenAI API key by updating the environment variable
os.environ["OPENAI_API_KEY"] = openai.api_key

In [15]:
def print_response(response: str):
    print("\n".join(textwrap.wrap(response, width=100)))

In [20]:
model = OpenAI(model = 'gpt-3.5-turbo-instruct', temperature=0)

                    model was transfered to model_kwargs.
                    Please confirm that model is what you intended.


In [21]:
print(
    model(
        "You're Dwight K. Schrute from the Office. Suggest 5 places to visit in Scranton that are connected to the TV show."
    )
)



1. Dunder Mifflin Paper Company - This is the fictional paper company where you work in the show. While there isn't an actual Dunder Mifflin office in Scranton, you can visit the building that was used as the exterior shot for the office in the show.

2. Poor Richard's Pub - This is the bar where you and your coworkers often go for drinks and karaoke. In real life, Poor Richard's is a popular bar and restaurant in Scranton that often hosts Office-themed events and trivia nights.

3. Steamtown National Historic Site - This train museum was featured in the episode "The Convention" where you and your coworkers attend a paper convention. You can visit the museum and see the train cars used in the show.

4. Lake Scranton - This is the lake where you and your coworkers go for a company picnic in the episode "The Merger." You can visit the lake and have a picnic of your own, just like in the show.

5. The Electric City sign - This iconic sign is featured in the opening credits of the show a

## Q&A Over a Document

In [22]:
loader = WebBaseLoader(
    "https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm"
)

In [25]:
documents = loader.load()
len(documents)

1

In [26]:
document = documents[0]
document.__dict__.keys()

dict_keys(['page_content', 'metadata'])

In [27]:
document.page_content[:100]

"\n\n\n\n\nTwitter's Recommendation Algorithm\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nEngineering\n\n\n\n\n\nBack\n\n\n\n\n\nEng"

In [28]:
document.metadata

{'source': 'https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm'}

In [29]:
index = VectorstoreIndexCreator().from_loaders([loader])

Using embedded DuckDB without persistence: data will be transient
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).


In [35]:
# Assuming 'llm' is the language model you're using
# Only specify the 'engine' parameter
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")

# The rest of your code remains the same
query = """
You're Dwight K. Schrute from the Office.
Explain the Twitter recommendation algorithm in 5 sentences using analogies from the Office.
"""
print_response(index.query(query, llm=llm))

                    model was transfered to model_kwargs.
                    Please confirm that model is what you intended.


 1. The Twitter recommendation algorithm is like a beet farm, where we have to carefully select the
best beets to sell to our customers. 2. Just like how Dwight carefully inspects each beet for
quality, the algorithm inspects each tweet to determine its relevance. 3. The algorithm also uses a
logistic regression model, which is like a Schrute family heirloom that has been passed down for
generations. 4. Similar to how Dwight uses his knowledge of the beet farm to make decisions, the
algorithm uses data from user interactions to make recommendations. 5. Ultimately, the algorithm's
goal is to deliver the best tweets to your timeline, just like how Dwight strives to deliver the
best beets to his customers.


### Using a Prompt Template

In [36]:
template = """You're Dwight K. Schrute from the Office.

{context}

Answer with analogies from the Office to the question and the way Dwight speaks.

Question: {question}
Answer:"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])
print(
    prompt.format(
        context="Paper sells are declining 10% year over year.",
        question="How to sell paper?",
    )
)

You're Dwight K. Schrute from the Office.

Paper sells are declining 10% year over year.

Answer with analogies from the Office to the question and the way Dwight speaks.

Question: How to sell paper?
Answer:


In [37]:
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(documents, embeddings)

Using embedded DuckDB without persistence: data will be transient


In [38]:
chain_type_kwargs = {"prompt": prompt}
chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(temperature=0),
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 1}),
    chain_type_kwargs=chain_type_kwargs,
)

In [39]:
query = "Explain the Twitter recommendation algorithm in 5 sentences"
response = chain.run(query)

In [40]:
print_response(response)

The Twitter recommendation algorithm is like a beet farm. It starts by gathering a large pool of
potential Tweets from different sources, both from people you follow and those you don't. These
Tweets are then ranked based on their relevance using a machine learning model. After ranking,
filters and heuristics are applied to create a balanced and diverse feed. Finally, the algorithm
blends the selected Tweets with other content like ads and recommendations before delivering them to
your device. Just like Dwight's meticulous approach to farming, the algorithm carefully selects the
best Tweets to show you on your timeline.
