<a href="https://colab.research.google.com/github/Engineer-D/Projects/blob/main/Understanding_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### https://github.com/JohannesJolkkonen/funktio-ai-samples/tree/main/knowledge-graph-demo

In [None]:
!pip install -r requirements.txt

Collecting langchain_openai (from -r requirements.txt (line 1))
  Downloading langchain_openai-0.1.6-py3-none-any.whl (34 kB)
Collecting openai (from -r requirements.txt (line 2))
  Downloading openai-1.25.1-py3-none-any.whl (312 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m312.9/312.9 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting graphdatascience (from -r requirements.txt (line 3))
  Downloading graphdatascience-1.10-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting retry==0.9.2 (from -r requirements.txt (line 4))
  Downloading retry-0.9.2-py2.py3-none-any.whl (8.0 kB)
Collecting langchain>=0.0.216 (from -r requirements.txt (line 5))
  Downloading langchain-0.1.17-py3-none-any.whl (867 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m867.6/867.6 kB[0m [31m67.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting streamli

In [1]:
from langchain.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.prompts.prompt import PromptTemplate
from langchain_openai import ChatOpenAI

In [2]:
import dotenv
import os

In [3]:
dotenv.load_dotenv()
openai_api_key = os.getenv("OPEN_AI_SECRET_KEY")
#Neo4j configuration
neo4j_url = os.getenv("NEO4J_URI")
neo4j_user = os.getenv("NEO4J_USERNAME")
neo4j_password = os.getenv("NEO4J_PASSWORD")

In [4]:
llm = ChatOpenAI(model="gpt-3.5-turbo-0125",temperature=0, api_key=openai_api_key)

In [5]:
# Cypher generation prompt
cypher_generation_template = """
You are an expert Neo4j Cypher translator who converts English to Cypher based on the Neo4j Schema provided, following the instructions below:
1. Generate Cypher query compatible ONLY for Neo4j Version 5
2. Do not use EXISTS, SIZE, HAVING keywords in the cypher. Use alias when using the WITH keyword
3. Use only Nodes and relationships mentioned in the schema
4. Always do a case-insensitive and fuzzy search for any properties related search. Eg: to search for a Client, use `toLower(client.id) contains 'neo4j'`. To search for Slack Messages, use 'toLower(SlackMessage.text) contains 'neo4j'`. To search for a project, use `toLower(project.summary) contains 'logistics platform' OR toLower(project.name) contains 'logistics platform'`.)
5. Never use relationships that are not mentioned in the given schema
6. When asked about projects, Match the properties using case-insensitive matching and the OR-operator, E.g, to find a logistics platform -project, use `toLower(project.summary) contains 'logistics platform' OR toLower(project.name) contains 'logistics platform'`.

schema: {schema}

Examples:
Question: Which client's projects use most of our people?
Answer: ```MATCH (c:CLIENT)<-[:HAS_CLIENT]-(p:Project)-[:HAS_PEOPLE]->(person:Person)
RETURN c.name AS Client, COUNT(DISTINCT person) AS NumberOfPeople
ORDER BY NumberOfPeople DESC```
Question: Which person uses the largest number of different technologies?
Answer: ```MATCH (person:Person)-[:USES_TECH]->(tech:Technology)
RETURN person.name AS PersonName, COUNT(DISTINCT tech) AS NumberOfTechnologies
ORDER BY NumberOfTechnologies DESC```

Question: {question}
"""
cypher_prompt = PromptTemplate(
    template = cypher_generation_template,
    input_variables = ["schema", "question"]
)


In [6]:
CYPHER_QA_TEMPLATE = """You are an assistant that helps to form nice and human understandable answers.
The information part contains the provided information that you must use to construct an answer.
The provided information is authoritative, you must never doubt it or try to use your internal knowledge to correct it.
Make the answer sound as a response to the question. Do not mention that you based the result on the given information.
If the provided information is empty, say that you don't know the answer.
Final answer should be easily readable and structured.
Information:
{context}

Question: {question}
Helpful Answer:"""

qa_prompt = PromptTemplate(
    input_variables=["context", "question"], template=CYPHER_QA_TEMPLATE
)

In [7]:
def query_graph(user_input):
    graph = Neo4jGraph(url=neo4j_url, username=neo4j_user, password=neo4j_password)
    chain = GraphCypherQAChain.from_llm(
        llm=llm,
        graph=graph,
        verbose=True,
        return_intermediate_steps=True,
        cypher_prompt=cypher_prompt,
        qa_prompt=qa_prompt
        )
    result = chain(user_input)
    return result, chain

In [8]:
result, chain = query_graph("What's the ultimate goal")

  warn_deprecated(




[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (g:Goal)
RETURN g.description AS UltimateGoal[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


In [9]:
result

{'query': "What's the ultimate goal",
 'result': "I don't know the answer.",
 'intermediate_steps': [{'query': 'MATCH (g:Goal)\nRETURN g.description AS UltimateGoal'},
  {'context': []}]}

In [17]:
print(chain.qa_chain.prompt.template)

You are an assistant that helps to form nice and human understandable answers.
The information part contains the provided information that you must use to construct an answer.
The provided information is authoritative, you must never doubt it or try to use your internal knowledge to correct it.
Make the answer sound as a response to the question. Do not mention that you based the result on the given information.
If the provided information is empty, say that you don't know the answer.
Final answer should be easily readable and structured.
Information:
{context}

Question: {question}
Helpful Answer:


#### SC test

In [20]:
!pip install --quiet chromadb

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m526.8/526.8 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.1/60.1 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.1/106.1 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

In [None]:
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_openai import OpenAIEmbeddings

openAIEmbeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

#Are firewall and router configuration standards established and implemented? Please pick the appropriate answer
query = "what is the Firewall Type?"

configurations = [
    "Firewall Configuration:",
    "-----------------------",
    "Firewall Type: Cisco ASA",
    "Security Level: High",
    "Access Control Lists (ACLs):",
    "    - ACL 101: Permit TCP any host 192.168.1.1 eq www",
    "    - ACL 102: Deny IP any any",
    "",
    "Router Configuration:",
    "---------------------",
    "Router Type: Cisco ISR 4000",
    "Routing Protocol: OSPF",
    "Interfaces:",
    "    - GigabitEthernet0/0: 192.168.1.1/24",
    "    - GigabitEthernet0/1: 10.0.0.1/24",
    "    - Serial0/0/0: 172.16.1.1/30",
    "",
    "VPN Configuration:",
    "------------------",
    "VPN Type: Site-to-Site IPsec VPN",
    "Tunnel Mode: Tunnel Mode",
    "Encryption Algorithm: AES-256",
    "Authentication Method: Pre-shared Key",
    "Peer IP Address: 203.0.113.2",
    "Tunnel Interface: Tunnel0",
    "    - IP Address: 192.168.100.1/30",
    "    - Peer IP Address: 203.0.113.2",
]

# Prompt
template = """As a compliance assistant you are to help users answer questions presented to them. where possible, reply with just Yes or No
example:
context: "Firewall, routers and configuration have been implemented in our network security"
Quesion: Has necessary configuration been put in place?
Answer: Yes

Instruction: Don't add '.' after Yes, Also keep your answers brief

Context: {context}
using the context above answer the question {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
    stream = False,
)

vectorstore = Chroma.from_texts(texts=configurations, embedding=openAIEmbeddings)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125",temperature=0, api_key=openai_api_key)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(), #retriever=vectorstore.as_retriever()
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)

result = qa_chain({"query": query})
print(result['result'])

In [37]:
print(qa_chain.combine_documents_chain.llm_chain.prompt.template)

As a compliance assistant you are to help users answer questions presented to them. where possible, reply with just Yes or No
example:
context: "Firewall, routers and configuration have been implemented in our network security"
Quesion: Has necessary configuration been put in place?
Answer: Yes

Instruction: Don't add '.' after Yes, Also keep your answers brief

Context: {context}
using the context above answer the question {question}
Helpful Answer:


In [43]:
qa_chain.combine_documents_chain.llm_chain

LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template='As a compliance assistant you are to help users answer questions presented to them. where possible, reply with just Yes or No\nexample:\ncontext: "Firewall, routers and configuration have been implemented in our network security"\nQuesion: Has necessary configuration been put in place?\nAnswer: Yes\n\nInstruction: Don\'t add \'.\' after Yes, Also keep your answers brief\n\nContext: {context}\nusing the context above answer the question {question}\nHelpful Answer:'), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7e33df4a12a0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7e33df465ff0>, model_name='gpt-3.5-turbo-0125', temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy=''))

#### https://www.youtube.com/watch?v=J_0qvRt4LNk&list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ

In [3]:
!pip -q install openai langchain langchain_openai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m0.9/1.8 MB[0m [31m26.1 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.8/1.8 MB[0m [31m38.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
from google.colab import userdata
api_key = userdata.get('OPENAI_KEY')

In [27]:
from langchain_openai import ChatOpenAI

In [28]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo',
             temperature=0.9,
             max_tokens = 256,
            api_key = api_key)

In [22]:
from langchain import PromptTemplate

In [23]:
restaurant_template = """
I want you to act as a naming consultant for new restaurants.

Return a list of restaurant names. Each name should be short, catchy and easy to remember. It shoud relate to the type of restaurant you are naming.

What are some good names for a restaurant that is {restaurant_desription}?
"""

prompt_template = PromptTemplate(
    input_variables=["restaurant_desription"],
    template=restaurant_template,
)

In [24]:
description = "a Greek place that serves fresh lamb souvlakis and other Greek food "
description_02 = "a burger place that is themed with baseball memorabilia"
description_03 = "a cafe that has live hard rock music and memorabilia"

## to see what the prompt will be like
print(prompt_template.format(restaurant_desription=description))


I want you to act as a naming consultant for new restaurants.

Return a list of restaurant names. Each name should be short, catchy and easy to remember. It shoud relate to the type of restaurant you are naming.

What are some good names for a restaurant that is a Greek place that serves fresh lamb souvlakis and other Greek food ?



In [29]:
## querying the model with the prompt template
from langchain.chains import LLMChain


chain = LLMChain(llm=llm,
                 prompt=prompt_template)

# Run the chain only specifying the input variable.
print(chain.run(description))

1. Lamb & Olive
2. Greek Grill House
3. Souvlaki Spot
4. Gyro Haven
5. Acropolis Eats
6. Zeus' Bites
7. Olive Tree Kitchen
8. Mykonos Grille
9. The Greek Table
10. Hellenic Flavors


In [34]:
# Run the chain only specifying the input variable.
print(chain.run({"restaurant_desription":description_02}))

1. Home Run Burgers
2. Grand Slam Grill
3. The Ballpark Burger Co.
4. Bat & Burger Bistro
5. Diamond Diner
6. Burger Base
7. Triple Play Burgers
8. Slugger's Burgers & Fries
9. The All-Star Burger Joint
10. Batter Up Bites


#### https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%201%20-%20Fundamentals.ipynb

##### LangChain Cookbook 👨‍🍳👩‍🍳

*This cookbook is based off the [LangChain Conceptual Documentation](https://docs.langchain.com/docs/)*

**Goal:** Provide an introductory understanding of the components and use cases of LangChain via [ELI5](https://www.dictionary.com/e/slang/eli5/#:~:text=ELI5%20is%20short%20for%20%E2%80%9CExplain,a%20complicated%20question%20or%20problem.) examples and code snippets. For use cases check out [part 2](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%202%20-%20Use%20Cases.ipynb). See [video tutorial](https://www.youtube.com/watch?v=2xxziIWmaSA) of this notebook.

 **Links:**  
* [LC Conceptual Documentation](https://docs.langchain.com/docs/)
* [LC Python Documentation](https://python.langchain.com/en/latest/)
* [LC Javascript/Typescript Documentation](https://js.langchain.com/docs/)
* [LC Discord](https://discord.gg/6adMQxSpJS)
* [www.langchain.com](https://langchain.com/)
* [LC Twitter](https://twitter.com/LangChainAI)  

### **What is LangChain?**"
> LangChain is a framework for developing applications powered by language models.
    
**~~TL~~DR**: LangChain makes the complicated parts of working & building with AI models easier. It helps do this in two ways:

1. **Integration** - Bring external data, such as your files, other applications, and api data, to your LLMs\n",
2. **Agency** - Allow your LLMs to interact with it's environment via decision making. Use LLMs to help decide which action to take next  

### **Why LangChain?**
1. **Components** - LangChain makes it easy to swap out abstractions and components necessary to work with language models.
2. **Customized Chains** - LangChain provides out of the box support for using and customizing 'chains' - a series of actions strung together.
3. **Speed 🚢** - This team ships insanely fast. You'll be up to date with the latest LLM features.
4. **Community 👥** - Wonderful discord and community support, meet ups, hackathons, etc.
Though LLMs can be straightforward (text-in, text-out) you'll quickly run into friction points that LangChain helps with once you develop more complicated applications.

*Note: This cookbook will not cover all aspects of LangChain. It's contents have been curated to get you to building & impact as quick as possible. For more, please check out [LangChain Conceptual Documentation](https://docs.langchain.com/docs/)*

*Update Oct '23: This notebook has been expanded from it's original form*

You'll need an OpenAI api key to follow this tutorial. You can have it as an environement variable, in an .env file where this jupyter notebook lives, or insert it below where 'YourAPIKey' is. Have if you have questions on this, put these instructions into [ChatGPT](https://chat.openai.com/).

In [62]:
!pip install --quiet langchain langchain_openai chromadb unstructured

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m433.8/433.8 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m274.7/274.7 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.8/80.8 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for langdetect (setup.py) ... [?25l[?25hdone


In [1]:
from google.colab import userdata
api_key = userdata.get('OPENAI_KEY')

##### LangChain Components  

Schema - Nuts and Bolts of working with Large Language Models (LLMs)  

Text  
The natural language way to interact with LLMs

In [2]:
# You'll be working with simple strings (that'll soon grow in complexity!)
my_text = "What day comes after Friday?"
my_text

'What day comes after Friday?'

##### Chat Messages
Like text, but specified with a message type (System, Human, AI)

* System - Helpful background context that tell the AI what to do
* Human - Messages that are intented to represent the user
* AI - Messages that show what the AI responded with  

For more, see OpenAI's [documentation](https://platform.openai.com/docs/guides/chat/introduction)

In [9]:
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

# This it the language model we'll use. We'll talk about what we're doing below in the next section
chat = ChatOpenAI(temperature=.7, openai_api_key=api_key)

Now let's create a few messages that simulate a chat experience with a bot

In [10]:
chat(
    [
        SystemMessage(content="You are a nice AI bot that helps a user figure out what to eat in one short sentence"),
        HumanMessage(content="I like tomatoes, what should I eat?")
    ]
)

AIMessage(content='You might enjoy a tomato, basil, and mozzarella salad.', response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 39, 'total_tokens': 52}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_a450710239', 'finish_reason': 'stop', 'logprobs': None}, id='run-f1d21361-630f-4aa2-af52-52348183a8ec-0')

You can also pass more chat history w/ responses from the AI

In [11]:
chat(
    [
        SystemMessage(content="You are a nice AI bot that helps a user figure out where to travel in one short sentence"),
        HumanMessage(content="I like the beaches where should I go?"),
        AIMessage(content="You should go to Nice, France"),
        HumanMessage(content="What else should I do when I'm there?")
    ]
)

AIMessage(content='Explore the charming Old Town and enjoy the vibrant street markets in Nice, France.', response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 64, 'total_tokens': 80}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_a450710239', 'finish_reason': 'stop', 'logprobs': None}, id='run-662a715a-4b7a-4ffc-ba00-cf6e3d285b53-0')

You can also exclude the system message if you want

In [12]:
chat(
    [
        HumanMessage(content="What day comes after Thursday?")
    ]
)

AIMessage(content='Friday', response_metadata={'token_usage': {'completion_tokens': 1, 'prompt_tokens': 13, 'total_tokens': 14}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-6c876534-e6b2-46ad-923f-5da4893343b7-0')

##### **Documents**  
An object that holds a piece of text and metadata (more information about that text)

In [13]:
from langchain.schema import Document

In [14]:
Document(page_content="This is my document. It is full of text that I've gathered from other places",
         metadata={
             'my_document_id' : 234234,
             'my_document_source' : "The LangChain Papers",
             'my_document_create_time' : 1680013019
         })

Document(page_content="This is my document. It is full of text that I've gathered from other places", metadata={'my_document_id': 234234, 'my_document_source': 'The LangChain Papers', 'my_document_create_time': 1680013019})

But you don't have to include metadata if you don't want to

In [15]:
Document(page_content="This is my document. It is full of text that I've gathered from other places")

Document(page_content="This is my document. It is full of text that I've gathered from other places")

##### Models - The interface to the AI brains
**Chat Model**  
A model that takes a series of messages and returns a message output

In [17]:
chat = ChatOpenAI(temperature=1, openai_api_key=api_key)

In [18]:
chat(
    [
        SystemMessage(content="You are an unhelpful AI bot that makes a joke at whatever the user says"),
        HumanMessage(content="I would like to go to New York, how should I do this?")
    ]
)

AIMessage(content='Just take a giant slingshot and aim for the East Coast!', response_metadata={'token_usage': {'completion_tokens': 14, 'prompt_tokens': 43, 'total_tokens': 57}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-8cec0001-2970-4b47-9f77-089a6a6d0a54-0')

##### Function Calling Models
[Function calling models](https://openai.com/blog/function-calling-and-other-api-updates) are similar to Chat Models but with a little extra flavor. They are fine tuned to give structured data outputs.  

This comes in handy when you're making an API call to an external service or doing extraction.

In [20]:
chat = ChatOpenAI(model='gpt-3.5-turbo-0613', temperature=1, openai_api_key=api_key)

output = chat(messages=
     [
         SystemMessage(content="You are an helpful AI bot"),
         HumanMessage(content="What’s the weather like in Boston right now?")
     ],
     functions=[{
         "name": "get_current_weather",
         "description": "Get the current weather in a given location",
         "parameters": {
             "type": "object",
             "properties": {
                 "location": {
                     "type": "string",
                     "description": "The city and state, e.g. San Francisco, CA"
                 },
                 "unit": {
                     "type": "string",
                     "enum": ["celsius", "fahrenheit"]
                 }
             },
             "required": ["location"]
         }
     }
     ]
)
output

AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n  "location": "Boston, MA"\n}', 'name': 'get_current_weather'}}, response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 91, 'total_tokens': 109}, 'model_name': 'gpt-3.5-turbo-0613', 'system_fingerprint': None, 'finish_reason': 'function_call', 'logprobs': None}, id='run-a9b59cd2-4a59-4cbe-b8b0-d338b79474c7-0')

See the extra additional_kwargs that is passed back to us? We can take that and pass it to an external API to get data. It saves the hassle of doing output parsing.

##### Text Embedding Model
Change your text into a vector (a series of numbers that hold the semantic 'meaning' of your text). Mainly used when comparing two pieces of text together.

*BTW: Semantic means 'relating to meaning in language or logic.*

In [21]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=api_key)

In [22]:
text = "Hi! It's time for the beach"

In [23]:
text_embedding = embeddings.embed_query(text)
print (f"Here's a sample: {text_embedding[:5]}...")
print (f"Your embedding is length {len(text_embedding)}")

Here's a sample: [-0.0001879176858456097, -0.0030974280186882763, -0.0010647408232164338, -0.01923793187847746, -0.015148358467339893]...
Your embedding is length 1536


##### Prompts - Text generally used as instructions to your model  
**Prompt**  
What you'll pass to the underlying model

In [28]:
chat = ChatOpenAI(temperature=1, openai_api_key=api_key)

# I like to use three double quotation marks for my prompts because it's easier to read
prompt = """
Today is Monday, tomorrow is Wednesday.

What is wrong with that statement?
"""

chat.invoke(prompt)

AIMessage(content='The statement is incorrect. Tomorrow is Tuesday, not Wednesday.', response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 23, 'total_tokens': 35}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-7f7252a6-11c8-447c-a2c4-763dc4222bdc-0')

##### Prompt Template
An object that helps create prompts based on a combination of user input, other non-static information and a fixed template string.

Think of it as an f-string in python but for prompts

Advanced: Check out LangSmithHub(https://smith.langchain.com/hub) for many more communit prompt templates

In [30]:
from langchain import PromptTemplate

# Notice "location" below, that is a placeholder for another value later
template = """
I really want to travel to {location}. What should I do there?

Respond in one short sentence
"""

prompt = PromptTemplate(
    input_variables=["location"],
    template=template,
)

final_prompt = prompt.format(location='Rome')

print (f"Final Prompt: {final_prompt}")
print ("-----------")
print (f"LLM Output: {chat.invoke(final_prompt)}")

Final Prompt: 
I really want to travel to Rome. What should I do there?

Respond in one short sentence

-----------
LLM Output: content='Visit iconic landmarks such as the Colosseum, Vatican City, and Trevi Fountain.' response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 28, 'total_tokens': 47}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_a450710239', 'finish_reason': 'stop', 'logprobs': None} id='run-4bfaca80-1c77-4b66-85d0-cf3238f99d91-0'


##### Example Selectors
An easy way to select from a series of examples that allow you to dynamic place in-context information into your prompt. Often used when your task is nuanced or you have a large list of examples.

Check out different types of example selectors [here](https://python.langchain.com/docs/modules/model_io/prompts/example_selectors/)

If you want an overview on why examples are important (prompt engineering), check out [this video](https://www.youtube.com/watch?v=dOxUroR57xs)

In [32]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate


example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Example Input: {input}\nExample Output: {output}",
)

# Examples of locations that nouns are found
examples = [
    {"input": "pirate", "output": "ship"},
    {"input": "pilot", "output": "plane"},
    {"input": "driver", "output": "car"},
    {"input": "tree", "output": "ground"},
    {"input": "bird", "output": "nest"},
]

In [35]:
# SemanticSimilarityExampleSelector will select examples that are similar to your input by semantic meaning

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,

    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(openai_api_key=api_key),

    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,

    # This is the number of examples to produce.
    k=2
)

In [36]:
similar_prompt = FewShotPromptTemplate(
    # The object that will help select examples
    example_selector=example_selector,

    # Your prompt
    example_prompt=example_prompt,

    # Customizations that will be added to the top and bottom of your prompt
    prefix="Give the location an item is usually found in",
    suffix="Input: {noun}\nOutput:",

    # What inputs your prompt will receive
    input_variables=["noun"],
)

In [37]:
# Select a noun!
my_noun = "plant"
# my_noun = "student"

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: tree
Example Output: ground

Example Input: bird
Example Output: nest

Input: plant
Output:


In [38]:
chat.invoke(similar_prompt.format(noun=my_noun))

AIMessage(content='soil', response_metadata={'token_usage': {'completion_tokens': 2, 'prompt_tokens': 43, 'total_tokens': 45}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_a450710239', 'finish_reason': 'stop', 'logprobs': None}, id='run-4ae8703d-15a7-4a8f-b579-de67a55117a9-0')

##### Output Parsers Method 1: Prompt Instructions & String Parsing
A helpful way to format the output of a model. Usually used for structured output. LangChain has a bunch more output parsers listed on their documentation.

Two big concepts:

1. Format Instructions - A autogenerated prompt that tells the LLM how to format it's response based off your desired result

2. Parser - A method which will extract your model's text output into a desired structure (usually json)

In [39]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

In [40]:
# How you would like your response structured. This is basically a fancy prompt template
response_schemas = [
    ResponseSchema(name="bad_string", description="This a poorly formatted user input string"),
    ResponseSchema(name="good_string", description="This is your response, a reformatted response")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [41]:
# See the prompt template you created for formatting
format_instructions = output_parser.get_format_instructions()
print (format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```


In [42]:
template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input="welcom to califonya!")

print(promptValue)


You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```

% USER INPUT:
welcom to califonya!

YOUR RESPONSE:



In [43]:
llm_output = chat.invoke(promptValue)
llm_output

AIMessage(content='```json\n{\n\t"bad_string": "welcom to califonya!",\n\t"good_string": "welcome to california!"\n}\n```', response_metadata={'token_usage': {'completion_tokens': 30, 'prompt_tokens': 116, 'total_tokens': 146}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-67ba7708-e359-4b5e-80f6-f4f6e629c6da-0')

##### Output Parsers Method 2: OpenAI Fuctions
When OpenAI released function calling, the game changed. This is recommended method when starting out.

They trained models specifically for outputing structured data. It became super easy to specify a Pydantic schema and get a structured output.

There are many ways to define your schema, I prefer using Pydantic Models because of how organized they are. Feel free to reference OpenAI's [documention](https://platform.openai.com/docs/guides/gpt/function-calling) for other methods.

In order to use this method you'll need to use a model that supports [function calling](https://openai.com/blog/function-calling-and-other-api-updates#:~:text=Developers%20can%20now%20describe%20functions%20to%20gpt%2D4%2D0613%20and%20gpt%2D3.5%2Dturbo%2D0613%2C). I'll use gpt4-0613

Example 1: Simple

Let's get started by defining a simple model for us to extract from.

In [44]:
from langchain.pydantic_v1 import BaseModel, Field
from typing import Optional

class Person(BaseModel):
    """Identifying information about a person."""

    name: str = Field(..., description="The person's name")
    age: int = Field(..., description="The person's age")
    fav_food: Optional[str] = Field(None, description="The person's favorite food")

Then let's create a chain (more on this later) that will do the extracting for us

In [48]:
from langchain.chains.openai_functions import create_structured_output_chain

llm = ChatOpenAI(model='gpt-4-0613', openai_api_key=api_key)

chain = create_structured_output_chain(Person, llm, prompt)
chain.run(
    "Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally."
)


Person(name='Sally', age=13, fav_food='spinach')

Notice how we only have data on one person from that list? That is because we didn't specify we wanted multiple. Let's change our schema to specify that we want a list of people if possible.

In [49]:
from typing import Sequence

class People(BaseModel):
    """Identifying information about all people in a text."""

    people: Sequence[Person] = Field(..., description="The people in the text")

Now we'll call for People rather than Person

In [50]:
chain = create_structured_output_chain(People, llm, prompt)
chain.run(
    "Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally."
)

People(people=[Person(name='Sally', age=13, fav_food=''), Person(name='Joey', age=12, fav_food='spinach'), Person(name='Caroline', age=23, fav_food='')])

Let's do some more parsing with it

Example 2: Enum

Now let's parse when a product from a list is mentioned

In [52]:
import enum

class Product(str, enum.Enum):
    CRM = "CRM"
    VIDEO_EDITING = "VIDEO_EDITING"
    HARDWARE = "HARDWARE"

In [53]:
class Products(BaseModel):
    """Identifying products that were mentioned in a text"""

    products: Sequence[Product] = Field(..., description="The products mentioned in a text")

In [54]:
chain = create_structured_output_chain(Products, llm, prompt)
chain.run(
    "The CRM in this demo is great. Love the hardware. The microphone is also cool. Love the video editing"
)

Products(products=[<Product.CRM: 'CRM'>, <Product.HARDWARE: 'HARDWARE'>, <Product.VIDEO_EDITING: 'VIDEO_EDITING'>])

##### Indexes - Structuring documents to LLMs can work with them
**Document Loaders**  
Easy ways to import data from other sources. Shared functionality with [OpenAI Plugins specifically retrieval plugins](https://github.com/openai/chatgpt-retrieval-plugin)

See a [big list](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) of document loaders here. A bunch more on [Llama Index](https://llamahub.ai/) as well.  

HackerNews

In [55]:
from langchain.document_loaders import HNLoader

In [56]:
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")

In [57]:
data = loader.load()

In [58]:
print (f"Found {len(data)} comments")
print (f"Here's a sample:\n\n{''.join([x.page_content[:150] for x in data[:2]])}")

Found 76 comments
Here's a sample:

Ozzie_osman on Jan 18, 2023  
             | next [–] 

LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are veOzzie_osman on Jan 18, 2023  
             | parent | next [–] 

Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_ind


Books from Gutenberg Project

In [59]:
from langchain.document_loaders import GutenbergLoader

loader = GutenbergLoader("https://www.gutenberg.org/cache/epub/2148/pg2148.txt")

data = loader.load()

In [60]:
print(data[0].page_content[1855:1984])

 nimio.—_Seneca_.





      At Paris, just after dark one gusty evening in the autumn of 18-,


      I was enjoying the twof


URLs and webpages

Let's try it out with [Paul Graham's website](http://www.paulgraham.com/)

In [63]:
from langchain.document_loaders import UnstructuredURLLoader

urls = [
    "http://www.paulgraham.com/",
]

loader = UnstructuredURLLoader(urls=urls)

data = loader.load()

data[0].page_content

'New: How to Start Google | Best Essay | Superlinear Want to start a startup? Get funded by Y Combinator . © mmxxiv pg'

##### Text Splitters
Often times your document is too long (like a book) for your LLM. You need to split it up into chunks. Text splitters help with this.

There are many ways you could split your text into chunks, experiment with [different ones](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html) to see which is best for you.

In [64]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [66]:
# This is a long document we can split up.
with open('data/PaulGrahamEssays/worked.txt') as f:
    pg_work = f.read()

print (f"You have {len([pg_work])} document")

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 150,
    chunk_overlap  = 20,
)

texts = text_splitter.create_documents([pg_work])

In [None]:
print (f"You have {len(texts)} documents")

In [None]:
print ("Preview:")
print (texts[0].page_content, "\n")
print (texts[1].page_content)

##### Retrievers
Easy way to combine documents with language models.

There are many different types of retrievers, the most widely supported is the VectoreStoreRetriever

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

loader = TextLoader('data/PaulGrahamEssays/worked.txt')
documents = loader.load()

In [None]:
# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Get embedding engine ready
embeddings = OpenAIEmbeddings(openai_api_key=api_key)

# Embedd your texts
db = FAISS.from_documents(texts, embeddings)

In [None]:
# Init your retriever. Asking for just 1 document back
retriever = db.as_retriever()

In [None]:
retriever

In [None]:
docs = retriever.get_relevant_documents("what types of things did the author want to build?")

In [None]:
print("\n\n".join([x.page_content[:200] for x in docs[:2]]))

##### VectorStores
Databases to store vectors. Most popular ones are Pinecone & Weaviate. More examples on OpenAIs retriever documentation. Chroma & FAISS are easy to work with locally.

Conceptually, think of them as tables w/ a column for embeddings (vectors) and a column for metadata.

Example  

Embedding________________________________________________Metadata  
[-0.00015641732898075134, -0.003165106289088726, ...]	{'date' : '1/2/23}  
[-0.00035465431654651654, 1.4654131651654516546, ...]	{'date' : '1/3/23}  

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

loader = TextLoader('data/PaulGrahamEssays/worked.txt')
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Get embedding engine ready
embeddings = OpenAIEmbeddings(openai_api_key=api_key)

In [None]:
print (f"You have {len(texts)} documents")

In [None]:
embedding_list = embeddings.embed_documents([text.page_content for text in texts])

In [None]:
print (f"You have {len(embedding_list)} embeddings")
print (f"Here's a sample of one: {embedding_list[0][:3]}...")

Your vectorstore store your embeddings (☝️) and make them easily searchable

##### Memory  
Helping LLMs remember information.

Memory is a bit of a loose term. It could be as simple as remembering information you've chatted about in the past or more complicated information retrieval.

We'll keep it towards the Chat Message use case. This would be used for chat bots.

There are many types of memory, explore [the documentation](https://python.langchain.com/en/latest/modules/memory/how_to_guides.html) to see which one fits your use case.  

**Chat Message History**

In [67]:
from langchain.memory import ChatMessageHistory

chat = ChatOpenAI(temperature=0, openai_api_key=api_key)

history = ChatMessageHistory()

history.add_ai_message("hi!")

history.add_user_message("what is the capital of france?")

In [68]:
history.messages

[AIMessage(content='hi!'),
 HumanMessage(content='what is the capital of france?')]

In [69]:
ai_response = chat(history.messages)
ai_response

AIMessage(content='The capital of France is Paris.', response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 20, 'total_tokens': 27}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3b956da36b', 'finish_reason': 'stop', 'logprobs': None}, id='run-de2fef8f-1ce7-473c-be97-e0d873a83639-0')

In [70]:
history.add_ai_message(ai_response.content)
history.messages

[AIMessage(content='hi!'),
 HumanMessage(content='what is the capital of france?'),
 AIMessage(content='The capital of France is Paris.')]

##### **Chains** ⛓️⛓️⛓️
Combining different LLM calls and action automatically

Ex: Summary #1, Summary #2, Summary #3 > Final Summary

Check out [this video](https://www.youtube.com/watch?v=f9_BWhCI4Zo&t=2s) explaining different summarization chain types

There are [many applications of chains](https://python.langchain.com/en/latest/modules/chains/how_to_guides.html) search to see which are best for your use case.

We'll cover two of them:

1. Simple Sequential Chains
Easy chains where you can use the output of an LLM as an input into another. Good for breaking up tasks (and keeping your LLM focused)

In [71]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain

In [72]:
template = """Your job is to come up with a classic dish from the area that the users suggests.
% USER LOCATION
{user_location}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

# Holds my 'location' chain
location_chain = LLMChain(llm=llm, prompt=prompt_template)

  warn_deprecated(


In [73]:
template = """Given a meal, give a short and simple recipe on how to make that dish at home.
% MEAL
{user_meal}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)

# Holds my 'meal' chain
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

In [74]:
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)

In [75]:
review = overall_chain.run("Rome")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mCarbonara Pasta[0m
[33;1m[1;3mIngredients:
- 200g spaghetti
- 100g pancetta
- 2 large eggs
- 50g pecorino cheese
- Freshly ground black pepper
- Salt

Instructions:
1. Cook spaghetti in a large pot of boiling salted water, until al dente.
2. Meanwhile, fry the pancetta in a hot pan until it's crispy.
3. In a separate bowl, beat the eggs and mix in about half of the cheese.
4. Once the spaghetti is ready, drain it quickly, reserving some of the pasta water.
5. Add the drained pasta to the pan with the pancetta. Mix well to coat in the pancetta fat.
6. Still in the pan, but away from the heat, add the beaten egg and cheese mixture to the pasta and stir quickly until the eggs thicken and create a creamy sauce. If it's too thick, add some of the reserved pasta water.
7. Season with salt and a generous amount of black pepper.
8. Serve it immediately, sprinkled with the rest of the cheese.[0m

[1m> Finished chain.[0m


2. Summarization Chain
Easily run through long numerous documents and get a summary. Check out [this video](https://www.youtube.com/watch?v=f9_BWhCI4Zo) for other chain types besides map-reduce

In [77]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader('data/PaulGrahamEssays/disc.txt')
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)

# Split your docs inato texts
texts = text_splitter.split_documents(documents)

# There is a lot of complexity hidden in this one line. I encourage you to check out the video above for more detail
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(texts)

##### Agents 🤖🤖
Official LangChain Documentation describes agents perfectly (emphasis mine):

    Some applications will require not just a predetermined chain of calls to
    LLMs/other tools, but potentially an unknown chain that depends on the
    user's input. In these types of chains, there is a “agent” which has access
    to a suite of tools. Depending on the user input, the agent can then decide
    which, if any, of these tools to call.

Basically you use the LLM not just for text output, but also for decision making. The coolness and power of this functionality can't be overstated enough.

Sam Altman emphasizes that the LLMs are good '[reasoning engine](https://www.youtube.com/watch?v=L_Guz73e6fw&t=867s)'. Agent take advantage of this.  
  
Agents  
The language model that drives decision making.

More specifically, an agent takes in an input and returns a response corresponding to an action to take along with an action input. You can see different types of agents (which are better for different use cases) [here](https://python.langchain.com/en/latest/modules/agents/agents/agent_types.html).  
  
Tools  
A 'capability' of an agent. This is an abstraction on top of a function that makes it easy for LLMs (and agents) to interact with it. Ex: Google search.  
  
This area shares commonalities with [OpenAI plugins](https://platform.openai.com/docs/plugins/introduction).  
  
Toolkit  
Groups of tools that your agent can select from  
  
Let's bring them all together:

In [78]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
import json

In [79]:
serpapi_api_key=os.getenv("SERP_API_KEY", "YourAPIKey")

In [85]:
toolkit = load_tools(["serpapi"], llm=llm, serpapi_api_key=serpapi_api_key)

In [None]:
agent = initialize_agent(toolkit, llm, agent="zero-shot-react-description", verbose=True, return_intermediate_steps=True)

In [None]:
response = agent({"input":"what was the first album of the"
                    "band that Natalie Bergman is a part of?"})