# <mark> prompt and llm

In [1]:
# Load environment variables

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

In [2]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")
llm("explain large language models in one sentence")

'\n\nLarge language models are deep learning models that are trained on large datasets to capture the complexity and context of natural language.'

In [3]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [4]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat(messages)

print(response.content,end='\n')

Sure! Here's an example of a Python script that trains a neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(100, 2)
y = np.random.randint(2, size=(100,))

# Define the neural network model
model = Sequential()
model.add(Dense(10, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=10)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss}, Accuracy: {accuracy}")
```

In this script, we first generate simulated data using `numpy.random.rand` and `numpy.random.randint`. We then define a simple neural network model using the `Sequential` class from Keras. The model consists of two dense layers, with 10 neurons in the f

In [5]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [6]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="gardient descent"))

'\nGradient descent is an optimization algorithm used to minimize a loss function by iteratively updating the parameters of a model in a way that moves towards a lower cost. It works by calculating the partial derivatives of the loss function with respect to each parameter in the model and then updating each parameter in the opposite direction of the gradient to minimize the cost.'

# <mark> Chains

In [7]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("learning rate"))


Learning rate is a hyperparameter that controls how much to adjust the weights of the network with respect to the loss gradient. It determines how quickly a neural network model learns a problem. A high learning rate results in faster learning, while a low learning rate results in slower learning but can achieve better accuracy.


In [8]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 300 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [9]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=False)
# overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)
# verbose=True is giving extra information about the outpu

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("xgboost")
print(explanation)

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-Dds5yz6ymlg7aeEWcOk97wyZ on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-Dds5yz6ymlg7aeEWcOk97wyZ on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco



XGBoost is an advanced machine learning algorithm that helps to make more accurate predictions. It works by combining multiple weak or base learners. A base learner is like a small piece of a larger puzzle. All of these small pieces come together to create a more complete picture.

XGBoost is a type of Gradient Boosting technique. Gradient Boosting is a process of combining multiple weak learners together to create a strong learner, which is more accurate than any of the individual learners.

To understand how XGBoost works, think of a group of five-year-olds playing a game. Each child is a weak learner, and the game is the task. To make sure that they all understand the rules of the game, the teacher (XGBoost) will explain the rules to each of them in turn. After the teacher explains the rules, each child will try to play the game.

At first, each child will make mistakes and will not be able to play the game correctly. The teacher will then use the mistakes that the children made t

# <mark> RecursiveCharacterTextSplitter

In [10]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation])
len(texts)

14

In [11]:
for i in texts:
    print(len(str(i)))

125
126
109
120
125
47
126
121
120
84
119
126
80
100


In [12]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

'XGBoost is an advanced machine learning algorithm that helps to make more accurate predictions. It'

# <mark> Embeddings and Vector stores

In [13]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

type(embeddings)

langchain.embeddings.openai.OpenAIEmbeddings

In [14]:
query_resule = embeddings.embed_query('delhi high air pollution')
print(len(query_resule))
query_resule

1536


[0.012124793955314035,
 0.013012650799773954,
 -0.011750229043467276,
 -0.01272825840485923,
 -0.017937480060417278,
 0.016480841220095712,
 -0.04833270004357509,
 -0.005972223995063683,
 -0.041507301192073735,
 -0.019005682870063868,
 0.007886665170411231,
 0.0031335100973804357,
 0.0039502690759423255,
 0.023569821131656454,
 -0.013886634662760454,
 -0.007414991250895729,
 0.02907730909492525,
 -0.020753650596036868,
 0.03559750794194695,
 -0.0009658910264716741,
 -0.02119758041525073,
 0.03412699425750676,
 -0.007026553823236851,
 0.003710963637985569,
 -0.00024580799056878435,
 0.013088950335232568,
 0.02601142861807449,
 -0.0328784445513509,
 -0.006759503120825204,
 -0.0016690666572421451,
 0.011375664131620519,
 0.005947946743146497,
 -0.017840371052748535,
 -0.0029080776408971004,
 -0.01184733805113602,
 -0.0016699337185842337,
 -0.022154799373109952,
 0.0007079443517366341,
 0.020281976676521363,
 -0.008448511606858765,
 0.022626473292625453,
 -0.02365305902049698,
 0.004463561

In [15]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(len(query_result))
print(query_result)

1536
[-0.03156375069380668, -0.017111435149988058, -0.016513133052335423, -0.020036465971716395, -0.0043709281786039065, 0.03491424318571947, -0.01316929020345671, -0.01927861714472842, -0.005501053638696896, -0.023892189773191103, -0.006132594638294393, 0.006106003516738503, -0.019876919242381056, -0.019052592611503355, -0.0007636659258666237, -0.0035100380570949425, 0.0170050706637645, 0.018600541682408118, 0.002906750124150579, -0.004497236005994385, -0.005391365262278849, 0.034462190393979124, -0.009213849696147415, -0.03350490629267687, 0.01066306954623364, -0.001001324715056647, 0.03209557312492448, -0.026710857971340988, 0.0014417414463945876, 0.022615812213218164, -0.004670078761768947, -0.023293887675538468, -0.02554084489759161, 0.00023516590064798836, -0.03486106094260769, -0.026724153532118933, 0.0007291805068911691, -0.005039031970340752, -0.001917890113738575, -0.0150639132022492, 0.05214533838270904, 0.0057835852365507815, -0.008276509867334629, 2.5721269928626387e-05, -

In [16]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone, Chroma

# pinecone.init(
#     api_key=os.getenv('PINECONE_API_KEY'),  
#     environment=os.getenv('PINECONE_ENV')  
# )

  from tqdm.autonotebook import tqdm


In [42]:
# Upload vectors to Pinecone

# index_name = "langchain-quickstart"
# search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [21]:
searchdb = Chroma.from_documents(texts, embeddings, persist_directory="./vector_db/chroma_db")

In [22]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = searchdb.similarity_search(query)

print(result)

[Document(page_content='their performance. This process is known as boosting.', metadata={}), Document(page_content='puzzle. All of these small pieces come together to create a more complete picture.', metadata={}), Document(page_content='XGBoost is an advanced machine learning algorithm that helps to make more accurate predictions. It', metadata={}), Document(page_content='XGBoost is a type of Gradient Boosting technique. Gradient Boosting is a process of combining', metadata={})]


# <mark> AGENTS

In [19]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [20]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m I need to solve a quadratic equation
Action: Python_REPL
Action Input: from scipy.optimize import root[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I need to define the function
Action: Python_REPL
Action Input: def f(x): return 3 * x**2 + 2*x -1[0m
Observation: [36;1m[1;3m[0m
Thought:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-Dds5yz6ymlg7aeEWcOk97wyZ on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..


[32;1m[1;3m I need to find the roots of the function
Action: Python_REPL
Action Input: root(f, [0, 1])[0m
Observation: [36;1m[1;3m[0m
Thought:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-Dds5yz6ymlg7aeEWcOk97wyZ on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-Dds5yz6ymlg7aeEWcOk97wyZ on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

[32;1m[1;3m I now know the final answer
Final Answer: The roots of the quadratic function 3 * x**2 + 2*x -1 are -1.0 and 0.3333333333333333.[0m

[1m> Finished chain.[0m


'The roots of the quadratic function 3 * x**2 + 2*x -1 are -1.0 and 0.3333333333333333.'