<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Load environment variables

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

In [2]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")
llm("explain large language models in one sentence")



'\n\nLarge language models are neural networks that use large amounts of text data to learn the statistical properties of language and generate text.'

In [3]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [4]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat(messages)

print(response.content,end='\n')

Sure! Here's an example of a Python script that trains a neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(100, 2)
y = np.random.randint(2, size=100)

# Define the neural network model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=10)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print('Loss:', loss)
print('Accuracy:', accuracy)
```

In this script, we first generate simulated data using `numpy.random.rand()` and `numpy.random.randint()`. We then define a neural network model using the `Sequential` class from Keras and add two dense layers using the `Dense` class. The first layer h

In [5]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [6]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

'\nAn autoencoder is a type of artificial neural network used for unsupervised learning that learns a compressed representation of input data. It is composed of an encoder that maps the input data to a compressed representation and a decoder that reconstructs the input data from the compressed representation.'

In [7]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))


An autoencoder is a type of artificial neural network used to learn efficient data representations (or features) in an unsupervised manner. It works by encoding the input data into a low-dimensional representation, which is then decoded back into the original data space. Autoencoders are used for dimensionality reduction, unsupervised feature learning, and representation learning.


In [8]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [9]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)



[1m> Entering new  chain...[0m
[36;1m[1;3m
An autoencoder is a type of artificial neural network used to learn efficient representations of data, also known as a compressed version of the input data. Autoencoders are used to reduce the dimensionality of data while preserving the important features of the data. Autoencoders are composed of an encoder and a decoder, which work together to learn a representation of the data.[0m
[33;1m[1;3m or less

An autoencoder is a cool machine that can help us understand data better. It's like a special computer program that can take a bunch of information and sort it out. It does this by looking at the data and figuring out which parts are most important.

Think of it like a game of hide and seek. The autoencoder is like a seeker, trying to find out what's important in the data by looking for patterns and important details. It can then take this important information and hide it away, so that it's easier for us to work with.

The autoencoder

In [10]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation])


In [11]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

'or less'

In [12]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

ValidationError: 1 validation error for OpenAIEmbeddings
model_name
  extra fields not permitted (type=value_error.extra)

In [1]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

In [None]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENV')  
)

In [None]:
# Upload vectors to Pinecone

index_name = "langchain-quickstart"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [None]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

In [1]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)



In [3]:
# Execute the Python agent

agent_executor.run("What is x**3 + 5*x**2 + 3*x + 1, for the values of x between 5 and 10?")



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to calculate the value of the equation for each value of x
Action: Python_REPL
Action Input: for x in range(5, 11): print(x**3 + 5*x**2 + 3*x + 1)[0m
Observation: [36;1m[1;3m266
415
610
857
1162
1531
[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 266, 415, 610, 857, 1162, 1531[0m

[1m> Finished chain.[0m


'266, 415, 610, 857, 1162, 1531'