<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Load environment variables
from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

## Basic LLM Models

In [2]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")
llm("explain large language models to a child.")

'\n\nLarge language models are like an encyclopedia for words. They help computers understand how the words in a language fit together and how they should be used. They help computers learn to think like humans, so they can understand the things we say and write.'

### Chat LLM Models

In [3]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [4]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data using pytorch.")
]
response=chat(messages)

print(response.content,end='\n')

Sure, here's an example script that trains a simple neural network on simulated data using PyTorch:

```python
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Define the neural network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(2, 10)
        self.fc2 = nn.Linear(10, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Generate some simulated data
x_train = np.random.rand(100, 2)
y_train = np.sum(x_train, axis=1, keepdims=True)

# Convert the data to PyTorch tensors
x_train = torch.from_numpy(x_train).float()
y_train = torch.from_numpy(y_train).float()

# Initialize the neural network and the optimizer
net = Net()
optimizer = optim.SGD(net.parameters(), lr=0.1)

# Train the neural network
for epoch in range(100):
    optimizer.zero_grad()
    y_pred = net(x_train)
    loss = nn.MSELoss()(y_pred, y_train)
    loss.

### LLM and ChatLLM Prompting

In [5]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert physicist with an expertise in theoretical physics. 
Mathematically derive {concept} . 
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)



In [6]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="time dilation in special relativity"))

"\nTime dilation in special relativity is derived from two postulates: the principle of relativity and the constancy of the speed of light.\n\nThe principle of relativity states that the laws of physics are the same in all inertial frames. In other words, laws of physics should look the same in any inertial frame of reference regardless of the motion of that frame.\n\nThe constancy of the speed of light states that the speed of light in a vacuum is always the same, no matter the motion of the observer or the source. \n\nThese two postulates combined allow us to derive the time dilation equation. \n\nLet's assume two frames of reference, S and S'. The distance between two events A and B is $\\Delta x$, and the time elapsed between them is $\\Delta t$. The time elapsed in frame S' between the same two events is $\\Delta t'$. We can relate these two times by the Lorentz transformation.\n\n$$\\Delta t' = \\gamma(\\Delta t - \\frac{v\\Delta x}{c^2})$$\n\nwhere $\\gamma$ is the Lorentz facto

In [7]:
# Run chatopenai with PromptTemplate
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)


template2 = """
Mathematically derive {concept} . 
"""

prompt2 = PromptTemplate(
    input_variables=["concept"],
    template=template,
)


messages2 = [
    SystemMessage(content="You are an expert physicist with an expertise in theoretical physics"),
    HumanMessage(content=prompt.format(concept="time dilation in special relativity") )
]
response=chat(messages)

print(response.content,end='\n')

Sure, here's an example script that trains a simple neural network on simulated data using PyTorch:

```
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Define the neural network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(2, 10)
        self.fc2 = nn.Linear(10, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Generate some simulated data
X = np.random.rand(100, 2)
y = np.sum(X, axis=1).reshape(-1, 1)

# Convert the data to PyTorch tensors
X = torch.from_numpy(X).float()
y = torch.from_numpy(y).float()

# Initialize the neural network and the optimizer
net = Net()
optimizer = optim.SGD(net.parameters(), lr=0.01)

# Train the neural network
for epoch in range(1000):
    optimizer.zero_grad()
    output = net(X)
    loss = nn.MSELoss()(output, y)
    loss.backward()
    optimizer.step()

    if epoch % 100 == 0:


### LLM Chains

In [8]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("lorentz transformation")) 


The Lorentz transformation is a mathematical equation that describes the transformation of space and time between two inertial frames of reference in special relativity. It is given by the following equation: 

x' = γ(x - vt) 

t' = γ(t - vx/c²) 

where x is the position coordinate, t is the time coordinate, v is the relative velocity between the two frames, c is the speed of light, and γ is the Lorentz factor. 

The Lorentz factor γ is given by: 

γ = 1/√(1 - (v²/c²)) 

Using the above equation, we can then derive the Lorentz transformation as follows: 

x' = (1/√(1-(v²/c²)))(x - vt) 

t' = (1/√(1-(v²/c²)))(t - (v/c²)x)


In [9]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["concept"],
    template="Turn the concept description of {concept} and explain it to me like I'm five. \
    Provide complete responses. ",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [10]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("lorentz transformation")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
The Lorentz transformation is a mathematical transformation between two coordinate systems S and S' which are in uniform relative motion with respect to each other.

Let x,y,z be the coordinates of a point in S and x',y',z' be the coordinates of the same point in S'. Let v be the uniform velocity of S' relative to S. 

The Lorentz Transformation is given by:

x' = γ(x - vt) 
y' = y 
z' = z 

where γ = (1 - v2/c2)−1/2

Let t' = t be the coordinates of a point in S and t'' = t' be the coordinates of the same point in S'. 

Then the Lorentz Transformation can be written as: 

t'' = γ(t - vx/c2) 
x'' = γ(x - vt) 
y'' = y 
z'' = z[0m
[33;1m[1;3m

The Lorentz Transformation is a way of figuring out where something is in two different places at the same time. Imagine you are standing in one place and then you move to another place. The Lorentz Transformation helps you figure out where you are in both places at the same t

In [11]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.prompts import (
    ChatPromptTemplate,
    PromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains import SimpleSequentialChain, SequentialChain

# ChatModel
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)


# Define system prompts and human prompt for chat model
system_prompt1 = PromptTemplate(
    template = "You are a data scientist with expertise in regression and statistics. Please provide answer to following questions. ",
    input_variables=[]
)

human_prompt1 = PromptTemplate(template="Randomly pick one metric to evaluate performance of {model name} model \
                                         and explain it to a non-technical user.\
                                         Assumption:  ", 
                               input_variables=["model name"])

system_message_prompt1 = SystemMessagePromptTemplate(prompt=system_prompt1)
human_message_prompt1 = HumanMessagePromptTemplate(prompt=human_prompt1)

messages1 = [system_message_prompt1, human_message_prompt1]

# Covert messages into chat prompts template
chat_prompt_template = ChatPromptTemplate.from_messages(messages1)

# Create LLM Chain
chain = LLMChain(llm=chat, prompt=chat_prompt_template, output_key="assumption")

# Run the chain only specifying the input variable.
#print(chain.run({"variable":"rainfall in mm", "plant":"rose"})) 

#Second Prompt
human_prompt2 = PromptTemplate(template="Randomly pick one metric to evaluate performance of this model except {assumption} \
                                         and explain it to a non-technical user.\
                                         Assumption 2 = ", 
                               input_variables=["assumption"])

human_message_prompt2 = HumanMessagePromptTemplate(prompt=human_prompt2)

messages2 = [ human_message_prompt2]

# Covert messages into chat prompts template
chat_prompt_template2 = ChatPromptTemplate.from_messages(messages2)

# Create LLM Chain
chain_two = LLMChain(llm=chat, prompt=chat_prompt_template2, output_key="assumption2")

overall_chain = SequentialChain(chains=[chain, chain_two],
                                      input_variables=["model name"],
                                      output_variables=["assumption", "assumption2"],
                                      verbose=True)
explanation = overall_chain({"model name":"Linear Regression"})
print(explanation)

# Run the chain only specifying the input variable.
#print(chain.run()) 



[1m> Entering new SequentialChain chain...[0m

[1m> Finished chain.[0m
{'model name': 'Linear Regression', 'assumption': 'One commonly used metric to evaluate the performance of a Linear Regression model is the Root Mean Squared Error (RMSE). \n\nRMSE is a measure of how well the model is able to predict the outcome variable (also known as the dependent variable) based on the input variables (also known as the independent variables). It is calculated by taking the square root of the average of the squared differences between the predicted values and the actual values of the outcome variable.\n\nTo explain this to a non-technical user, imagine that you are trying to predict the price of a house based on its size, number of bedrooms, and location. You build a Linear Regression model using historical data on house prices and their characteristics. \n\nTo evaluate how well your model is able to predict the actual prices of houses, you can use the RMSE metric. The RMSE tells you how f

In [12]:
explanation

{'model name': 'Linear Regression',
 'assumption': 'One commonly used metric to evaluate the performance of a Linear Regression model is the Root Mean Squared Error (RMSE). \n\nRMSE is a measure of how well the model is able to predict the outcome variable (also known as the dependent variable) based on the input variables (also known as the independent variables). It is calculated by taking the square root of the average of the squared differences between the predicted values and the actual values of the outcome variable.\n\nTo explain this to a non-technical user, imagine that you are trying to predict the price of a house based on its size, number of bedrooms, and location. You build a Linear Regression model using historical data on house prices and their characteristics. \n\nTo evaluate how well your model is able to predict the actual prices of houses, you can use the RMSE metric. The RMSE tells you how far off your predicted prices are from the actual prices, on average. \n\nFor

### Text Splitting

In [15]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 250,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation['assumption'] + explanation['assumption2']])

texts

[Document(page_content='One commonly used metric to evaluate the performance of a Linear Regression model is the Root Mean Squared Error (RMSE).', metadata={}),
 Document(page_content='RMSE is a measure of how well the model is able to predict the outcome variable (also known as the dependent variable) based on the input variables (also known as the independent variables). It is calculated by taking the square root of the average', metadata={}),
 Document(page_content='of the squared differences between the predicted values and the actual values of the outcome variable.', metadata={}),
 Document(page_content='To explain this to a non-technical user, imagine that you are trying to predict the price of a house based on its size, number of bedrooms, and location. You build a Linear Regression model using historical data on house prices and their', metadata={}),
 Document(page_content='characteristics.', metadata={}),
 Document(page_content='To evaluate how well your model is able to predi

In [16]:
# Individual text chunks can be accessed with "page_content"
for i in range(len(texts)):
    print(texts[i].page_content)

One commonly used metric to evaluate the performance of a Linear Regression model is the Root Mean Squared Error (RMSE).
RMSE is a measure of how well the model is able to predict the outcome variable (also known as the dependent variable) based on the input variables (also known as the independent variables). It is calculated by taking the square root of the average
of the squared differences between the predicted values and the actual values of the outcome variable.
To explain this to a non-technical user, imagine that you are trying to predict the price of a house based on its size, number of bedrooms, and location. You build a Linear Regression model using historical data on house prices and their
characteristics.
To evaluate how well your model is able to predict the actual prices of houses, you can use the RMSE metric. The RMSE tells you how far off your predicted prices are from the actual prices, on average.
For example, if the RMSE is $10,000, it means that on average, your pr

### Create Embeddings

In [17]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [19]:
# Turn the first text chunk into a vector with the embedding
import numpy as np

query_result = embeddings.embed_query(texts[i].page_content)
    

In [20]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Chroma
index_name = "langchain-quickstart"

docsearch = Chroma.from_texts(texts, embeddings)
#pinecone.init(
#    api_key=os.getenv('PINECONE_API_KEY'),  
#    environment=os.getenv('PINECONE_ENV')  
#)

# Upload vectors to Pinecone

#index_name = "langchain-quickstart"
#search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

query = "What can Lorentz transformation tell us?"
docs = docsearch.similarity_search(query)

print(docs[0].page_content)

  from tqdm.autonotebook import tqdm
Using embedded DuckDB without persistence: data will be transient


AttributeError: 'Document' object has no attribute 'replace'

In [None]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

In [None]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [None]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")