LEARNING LANGCHAIN BY USING LANGCHAIN (PYTHON)
===============================================

BY: ELLIOTT RISCH

DATE: 2023-10-11

In [None]:
# Before you begin, you will need to create a .env file in the same directory as this notebook. 
# The .env file should contain the following lines, with your own api keys filled in:

OPENAI_API_KEY="sk-XXXXXX"

PINECONE_ENV="xx-xxxxx-xxx-free"

PINECONE_API_KEY="xxxxx-xxxx-xxxx-xxxx-xxxxx"

# You should also have installed the required packages in requirements.txt. 

***I. SETTING UP AND USING --> ChatOpenAI***

**Step 1:** Find and load the .env file containing the relevant api keys.

In [1]:
from dotenv import load_dotenv, find_dotenv # for .env file
load_dotenv(find_dotenv()) # load environment variables from .env file

# Should read "True" below, if the .env file was loaded correctly.

True

**STEP 2:** Run a quick test query to assure that the OpenAI API key has been loaded correctly from the .env.

In [3]:
from langchain.llms import OpenAI # Import the model class
llm = OpenAI(model_name="text-davinci-003") # Create a new model instance
llm("explain large language models in one sentence") # Call the model with a prompt

# See the system's response below:

'\n\nLarge language models use large amounts of data to learn the probability of a word given its context in a given language.'

**STEP 3:** Call the class imports necessary to talk with a OpenAI chat model, i.e., GPT-3 or GPT-4.

In [5]:
from langchain.schema import( # "Schema" here means the mode of communication between the human and the system.
    AIMessage,  # What the system says to you. 
    HumanMessage, # What you use to pose your prompt to the system.
    SystemMessage, # What you use to tell the system it's role.
)
from langchain.chat_models import ChatOpenAI # From langchain's chat_models module, we will import the ChatOpenAI class to go with the chosen schema.

**STEP 4:** Create a new instance of the OpenAI chat model class, create a list of the components of the chat, and then call the chat function to generate a response.

In [6]:
chat = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.3) # Create a new chat instance
messages = [ # Create a list of messages to send to the system
        SystemMessage(content="You are an expert data scientist."), # Tell the system it's role
        HumanMessage(content="Write a python script that trains a neural network on simulated data.") # Pose a prompt to the system
]
response = chat(messages) # Send the messages to the system

**STEP 4:** Display the system's response.

In [7]:
print(response.content,end='\n') # Print the system's response

Sure! Here's an example of a Python script that trains a neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(1000, 10)
y = np.random.randint(2, size=(1000, 1))

# Define the model architecture
model = Sequential()
model.add(Dense(32, input_dim=10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=32)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")
```

In this script, we first generate simulated data using `numpy`. We create a random matrix `X` of shape `(1000, 10)` and a random binary vector `y` of shape `(1000, 1)`.

Then, we define the neural network model using `Sequential` from Keras

***II. SETTING UP AND USING --> PromptTemplates***

**STEP 1:** Call the class imports necessary to talk with a OpenAI chat model and to use the PromptTemplates class.

In [8]:
from langchain.llms import OpenAI # Import the model class
from langchain import PromptTemplate # Import the PromptTemplate class, which allows you to create prompts with variables.

template = """
You are an expert data scientist with an expertise in building deep learning models.
Explain the concept {concept} in a couple of sentences.
""" # Create a template for a prompt, with a variable "concept" that can be filled in later.

prompt = PromptTemplate( # Create a new prompt instance
    input_variables=["concept"], # Specify the variables that will be filled in
    template=template, # Specify the template to use
)

**STEP 2:** Print the prompt template for the chat model.

In [9]:
prompt # Print the prompt

PromptTemplate(input_variables=['concept'], output_parser=None, partial_variables={}, template='\nYou are an expert data scientist with an expertise in building deep learning models.\nExplain the concept {concept} in a couple of sentences.\n', template_format='f-string', validate_template=True)

**STEP 3:** Define the llm and send the prompt template to the chat model with a variable value.

In [10]:
llm = OpenAI(model_name="text-davinci-003") # Create a new model instance
llm(prompt.format(concept="gradient descent")) # Call the model with the prompt, with the variable filled in

'\nGradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function that minimizes a cost function. It works by iteratively computing the partial derivatives of the cost function with respect to the parameters and updating the parameters in the negative direction of the gradient of the cost function with respect to the parameters.'

***III. SETTING UP AND USING --> Chains***

**STEP 1:** Make a chain call to the chat model using the prompt template that we have already defined.

In [11]:
from langchain.chains import LLMChain # Import the LLMChain class, which allows you to chain together multiple language model calls.

chain = LLMChain( # Create a new chain instance
    llm=llm, # Specify the language model to use
    prompt=prompt, # Specify the prompt to use
)

# run the chain only specifying the input variable.
print(chain.run("gradient descent")) # Call the chain with the input variable



Gradient descent is an optimization algorithm used to find the set of parameters that minimize a given cost function. It works by iteratively updating the parameters in the direction of the negative gradient of the cost function with respect to the parameters. The magnitude of the updates is determined by a parameter called the learning rate.


**STEP 2:** Add a new link to the chain.

In [16]:
template = "Turn this({ml_concept}) concept description into a 500 word explanation that a five year old child would understand." # Specify the template

second_prompt = PromptTemplate( # Create a new prompt instance
    input_variables=["ml_concept"], # Specify the variables that will be filled in
    template=template, # Specify the template to use
)   

chain_two = LLMChain( # Create a new chain instance 
    llm=llm, # Specify the language model to use
    prompt=second_prompt, # Specify the prompt to use
)

**STEP 3:** Combine the two links of the chain into a single chain.

In [17]:
from langchain.chains import SimpleSequentialChain # Import the SimpleSequentialChain class, which allows you to chain together multiple language model calls.

sequential_chain = SimpleSequentialChain( # Create a new chain instance
    chains=[chain, chain_two], # Specify the chains to use
    verbose=True, # Specify whether to print the intermediate results
)

# run the chain only specifying the input variable for the first chain.
explanation = sequential_chain.run("gradient descent") # Call the chain with the input variable



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
Gradient descent is an iterative optimization algorithm used to minimize a function by adjusting the parameters of a model according to the direction of the gradient of the loss function with respect to the model parameters. It is an important technique for optimizing deep learning models by finding the optimal set of parameters that minimize the loss function and improve the model's performance.[0m
[33;1m[1;3m

Gradient descent is like a game where you have to find the best way down a mountain. You have to try different paths to see which one is the fastest way to the bottom.

You start at the top of the mountain. You can see the bottom, but it's a long way down. You don't know the best way to get there, so you have to try different paths.

For each path, you need to keep track of how long it takes you to get down the mountain. You also need to make sure you don't get lost and end up in the wrong place.

At the e

***IV. SETTING UP AND USING --> Embeddings and Vector Stores***

**Step 1:** Call the class imports necessary to recursively split the text into chunks and then split up the text into chunks.

In [18]:
from langchain.text_splitter import RecursiveCharacterTextSplitter # Import the RecursiveCharacterTextSplitter class, which allows you to split text into smaller chunks.

text_splitter = RecursiveCharacterTextSplitter( # Create a new text splitter instance
    chunk_size = 100, # Specify the maximum length of the chunks
    chunk_overlap = 0, # Specify the overlap between chunks
)

texts = text_splitter.create_documents([explanation]) # Split the text into chunks

**Step 2:** Check out the chunks.

In [19]:
texts # Print the chunks

[Document(page_content='Gradient descent is like a game where you have to find the best way down a mountain. You have to try', metadata={}),
 Document(page_content='different paths to see which one is the fastest way to the bottom.', metadata={}),
 Document(page_content="You start at the top of the mountain. You can see the bottom, but it's a long way down. You don't", metadata={}),
 Document(page_content='know the best way to get there, so you have to try different paths.', metadata={}),
 Document(page_content='For each path, you need to keep track of how long it takes you to get down the mountain. You also', metadata={}),
 Document(page_content="need to make sure you don't get lost and end up in the wrong place.", metadata={}),
 Document(page_content='At the end of each path, you compare the time you took with the time it took to go down the other', metadata={}),
 Document(page_content='paths. The fastest path is the one with the shortest time.', metadata={}),
 Document(page_content=

**Step 3:** Extract the plain text from the individual elements of the list with the page_content variable.

In [20]:
texts[0].page_content # Print the first chunk

'Gradient descent is like a game where you have to find the best way down a mountain. You have to try'

**Step 4:** Import and then create a new instance of the OpenAIEmbeddings class.

In [22]:
from langchain.embeddings import OpenAIEmbeddings # Import the OpenAIEmbeddings class, which allows you to create embeddings from text.

embeddings = OpenAIEmbeddings(model_name="ada") # Create a new embeddings instance

**Step 5:** Call "embed_query" on the raw text value of the first page_content variable.

In [24]:
query_result = embeddings.embed_query(texts[0].page_content) # Create embeddings from the first chunk

query_result # Print the embeddings 

# You should see the vector representation of that text (i.e., the embedding) below:

[0.015347131240837595,
 0.006830774119852393,
 0.030486165906851145,
 0.041140715753248555,
 -0.013089282193277384,
 0.04711309359742752,
 -0.0028795380537913863,
 0.019030442105339277,
 0.021392341302956624,
 -0.03662501952077325,
 -0.048569767758550715,
 -0.0448656483541537,
 0.02592884886929494,
 0.008386296994550813,
 0.00504374354954852,
 -0.0387059852690137,
 0.043908401502259935,
 0.03610477528974547,
 0.004000659025864782,
 0.017032714241970403,
 0.034460812162371006,
 -0.033711663747946395,
 0.04711309359742752,
 0.013411831791122398,
 0.02865491567587053,
 0.024368123626792047,
 -0.010212345789049514,
 0.034564863243750694,
 -0.0060504114986009465,
 -0.023369259695107607,
 -0.008578786465961438,
 -0.031922034322046676,
 0.011143578007953242,
 0.04940215778313713,
 -0.0011289246401954246,
 0.02948730197516671,
 0.029404064090295133,
 0.041182334695684344,
 -0.030028353814767268,
 -0.014389886251588943,
 0.01222568075583181,
 0.0002477651240982619,
 0.012121632468419786,
 0.040

**Step 6:** Store the resulting vector embeddings in a pinecone vector store.

In [32]:
import os # Import the os module, which allows you to interact with the operating system.
import pinecone # Import the pinecone module, which allows you to interact with the Pinecone API.
from langchain.vectorstores import Pinecone # Import the Pinecone class, which allows you to create vector stores.

# Create a new vector store instance
pinecone.init(
    api_key=os.environ.get("PINECONE_API_KEY"), # Specify the Pinecone API key
    environment=os.environ.get("PINECONE_ENV"), # Specify the Pinecone environment
)

# Create an index in the vector store to store the embeddings.
index_name = "index-name"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name, # Specify the index name
)

**Step 7:** Query the vector database via similarity search.

In [33]:
query = "What's so interesting about gradient descent?" # Specify the query
result = search.similarity_search(query) # Search for similar documents

**Step 8:** Display the results of the similarity search.

In [34]:
result # Print the result

[Document(page_content='that tells you how long it takes you to get down each path.', metadata={}),
 Document(page_content='Gradient descent is like a game where you have to find the best way down a mountain. You have to try', metadata={}),
 Document(page_content='Gradient descent is like this game, but instead of a mountain, it is used to find the best way to', metadata={}),
 Document(page_content='different paths to see which one is the fastest way to the bottom.', metadata={})]

***V. SETTING UP AND USING --> Agents***

**Step 1:** Call the class imports necessary to run the agent.

In [35]:
from langchain.agents.agent_toolkits import create_python_agent # Import the create_python_agent function, which allows you to create a Python agent.
from langchain.tools.python.tool import PythonREPLTool # Import the PythonREPLTool class, which allows you to create a Python REPL tool.
from langchain.tools.python.tool import PythonREPL # Import the PythonREPL class, which allows you to create a Python REPL.
from langchain.llms.openai import OpenAI # Import the OpenAI class, which allows you to create a language model.

**Step 2:** Create a Python Agent executer.

In [36]:
agent_executer = create_python_agent( # Create a new agent executer instance
    llm = OpenAI(temperature=0, max_tokens=1000), # Specify the language model to use and limit the number of tokens to 1000
    tool=PythonREPLTool(), # Specify the tool to use
    verbose=True, # Specify whether to print the intermediate results
)

**Step 3:** Have the agent execute python code.

In [37]:
agent_executer.run("Find the roots (zeros) if the quadratic equation x^2 + 2x + 1 = 0.") # Call the agent executer with a prompt



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to solve a quadratic equation.
Action: Python REPL
Action Input: import numpy as np[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use the numpy function to solve the equation.
Action: Python REPL
Action Input: np.roots([1,2,1])[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: The roots of the equation are -1 and -1.[0m

[1m> Finished chain.[0m


'The roots of the equation are -1 and -1.'