<a href="https://colab.research.google.com/github/rabbitmetrics/langchain-13-min/blob/main/notebooks/langchain-13-min.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Load environment variables

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())

True

In [1]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm = OpenAI(model_name="text-davinci-003")
llm("explain large language models in one sentence")

'\n\nLarge language models are powerful predictive models used to improve natural language processing tasks, such as language translation, text summarization, and sentiment analysis.'

In [2]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [3]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat(messages)

print(response.content,end='\n')

Sure, here's an example script that trains a simple neural network on simulated data:

```python
import numpy as np
import tensorflow as tf

# Generate simulated data
x = np.random.rand(100, 10)
y = np.random.randint(0, 2, size=(100, 1))

# Define the neural network architecture
model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x, y, epochs=10, batch_size=32)

# Evaluate the model on new data
x_test = np.random.rand(10, 10)
y_test = np.random.randint(0, 2, size=(10, 1))
loss, accuracy = model.evaluate(x_test, y_test)
print(f'Test loss: {loss}, Test accuracy: {accuracy}')
```

This script generates a simulated dataset of 100 samples with 10 features and a binary label. It then defines a simple neural network with one hidden laye

In [4]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [5]:
# Run LLM with PromptTemplate

llm(prompt.format(concept="autoencoder"))

'\nAn autoencoder is a type of artificial neural network that is used to learn a compressed representation of input data. It works by learning to reconstruct its input data through a series of layers, with the aim of reducing the dimensionality of the data while still preserving its important features.'

In [6]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variable.
print(chain.run("autoencoder"))


An autoencoder is a type of artificial neural network that attempts to reconstruct its own input. It is used for unsupervised learning, where the goal is to compress the input data into a lower dimensional representation, known as an encoding, and then to reconstruct the input from the encoding.


In [13]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)
print(chain_two.run("autoencoder"))



An autoencoder is a type of artificial neural network that helps to reduce the amount of data needed to represent a certain type of data. It works by taking a large set of data, such as images, and compressing it down into a smaller set of features that still accurately represent the original data.

To explain how an autoencoder works, let’s imagine that we have a lot of images of different cats. To make it easier to store and process these images, we can use an autoencoder to compress them.

The autoencoder starts by taking the images and breaking them down into smaller parts (like pixels). It then looks for patterns in the pixels, looking for things like edges, shapes, and colors. Once it has identified these patterns, it can create a code that represents them. This code is the autoencoder’s “compressed” version of the original image.

The autoencoder then takes this code and uses it to recreate the original image. This is done by taking the code and turning it back into the smalle

In [11]:
# Define a sequential chain using the two chains above: the second chain takes the output of the first chain as input

from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("autoencoder")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
An autoencoder is a type of artificial neural network that learns to copy its input to its output. It consists of two parts: an encoder, which transforms the input into a hidden representation, and a decoder, which reconstructs the input from the hidden representation. The purpose of autoencoders is to compress the input data into a lower dimensional representation, while still preserving key information.[0m
[33;1m[1;3m

An autoencoder is a type of machine learning tool that can help us understand data better. It's like a special kind of game where the computer has to copy its input to its output. To do this, the autoencoder has two parts. 

The first part is called the encoder. This part takes the data (like pictures, numbers, words, etc.) and turns it into a hidden representation. It's like taking the data and putting it in a secret code! 

The second part is called the decoder. This part takes the hidden repres

In [14]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 0,
)

texts = text_splitter.create_documents([explanation])


In [15]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

"An autoencoder is a type of machine learning tool that can help us understand data better. It's like"

In [16]:
# Import and instantiate OpenAI embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [18]:
# Turn the first text chunk into a vector with the embedding

query_result = embeddings.embed_query(texts[0].page_content)
print(query_result)

[-0.03901302437936434, 0.040125913876690324, 0.013684441321404882, 0.031222784859566418, 0.031841058251249305, 0.011994495044215746, -0.01605448870197422, 0.007831457062940224, 0.0231440178811595, 0.003951794463796237, -0.045587327284351505, -0.023308889916373865, 0.012880686657275193, 0.01853788455901554, 0.02209295330156311, -0.003676147894872831, 0.0334691777480637, 0.0026688782936413425, 0.026111727553534133, -0.005358365824515124, 0.015796873702230017, -4.1499963074787335e-05, 0.06796880804452361, -0.019166462103653355, 0.04892599950161975, 0.031264003334031294, 0.012406676994896811, 0.045133925927882984, -0.013684441321404882, -0.017373470571624597, -0.020351484396954164, 0.009536860500884332, 0.031511310455530274, 0.034664504194319434, 0.040290787774549836, 0.0210831071964316, 0.03668419640458245, -0.010072696943637459, -0.023844726359126992, -0.019073721001768663, 0.008001482105954631, -0.003717366136507066, -0.004065144802913367, 0.0204751398203488, -0.035509478264236574, -0.0

In [27]:
# Import and initialize Pinecone client

import os
import pinecone
from langchain.vectorstores import Pinecone


pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENV')  
)

In [28]:
# Upload vectors to Pinecone

index_name = "langchain-quickstart"
search = Pinecone.from_documents(texts, embeddings, index_name=index_name)

In [29]:
# Do a simple vector similarity search

query = "What is magical about an autoencoder?"
result = search.similarity_search(query)

print(result)

[Document(page_content='autoencoder has two parts.', metadata={}), Document(page_content="An autoencoder is a type of machine learning tool that can help us understand data better. It's like", metadata={}), Document(page_content='For example, if we have a lot of pictures of cats, an autoencoder can help us figure out the', metadata={}), Document(page_content='The goal of an autoencoder is to compress the data into a smaller representation, while still', metadata={})]


In [25]:
# Import Python REPL tool and instantiate Python agent

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [26]:
# Execute the Python agent

agent_executor.run("Find the roots (zeros) if the quadratic function 3 * x**2 + 2*x -1")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to solve a quadratic equation
Action: Python REPL
Action Input: import numpy as np[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use the numpy function to solve the equation
Action: Python REPL
Action Input: np.roots([3,2,-1])[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: (-1.0, 0.3333333333333333)[0m

[1m> Finished chain.[0m


'(-1.0, 0.3333333333333333)'