# Project LangChain Project Tutorial



In [1]:
# Load environment variables

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())
     

True

In [2]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm_davinci = OpenAI(model_name="text-davinci-003")
llm_davinci("explain large language models text-davinci-003")

'\n\nLarge language models are a type of artificial intelligence (AI) technology that uses deep learning algorithms to generate text. They are trained on large datasets of text to learn the patterns of language, allowing them to generate text that is more natural and human-like than traditional AI models. Large language models are being used to create virtual assistants, to generate news articles, to create dialogue for video games, and for many other applications. The Text-Davinci-003 model is a deep learning-based natural language processing (NLP) model that utilizes the Transformer architecture. It is designed to be able to generate human-like text from a given prompt. The model is trained on a large dataset of text, and can generate text that is more natural than traditional AI models.'

In [3]:
# import schema for chat messages and ChatOpenAI in order to query chatmodels GPT-3.5-turbo or GPT-4

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [8]:
chat_llm_gpt3 = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat_llm_gpt3(messages)

print(response.content,end='\n')
     

Sure! Here's an example of a Python script that trains a neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(100, 2)
y = np.random.randint(0, 2, size=(100, 1))

# Create a neural network model
model = Sequential()
model.add(Dense(10, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=10)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print('Loss:', loss)
print('Accuracy:', accuracy)
```

In this script, we first generate simulated data using `np.random.rand` and `np.random.randint`. We then create a neural network model using the `Sequential` class from Keras. The model consists of two dense layers, with 10 units in the first la

In [5]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

In [6]:
# Run LLM with PromptTemplate

llm_davinci(prompt.format(concept="regression"))

'\nRegression is a type of supervised machine learning technique used to predict a continuous numerical value (e.g. a price or quantity). It works by training a model on a set of input data and then using the model to make predictions about future data. Regression algorithms are used widely in many applications such as predicting stock prices, forecasting sales, and predicting house prices.'

In [7]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain

chain = LLMChain(llm=llm_davinci, prompt=prompt) # you can choose the llm model to response with

# Run the chain only specifying the input variable.
print(chain.run("regression"))


Regression is a type of predictive modeling technique which is used to predict a continuous numerical value based on a given set of independent variables. It is used to understand the relationship between the independent variables and the dependent variable.


In [11]:
# Define a second prompt 

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)

chain_two = LLMChain(llm=llm_davinci, prompt=second_prompt)

In [13]:
# Example-1 SimpleSequentialChain
# Define simple sequential chain using the two chains above: the second chain takes the output of the first chain as input
from langchain.chains import SimpleSequentialChain 
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True) # Note the second chain takes the output of the first chain as input

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("regression")
print(explanation)
     



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
Regression is a technique used to analyze the relationship between a dependent variable and one or more independent variables. It is used to predict a numerical value (such as a price, probability, etc.) from a given set of data points. Regression models are used to estimate the correlations between different variables, and can provide insights into how these variables interact.[0m
[33;1m[1;3m

Regression is like a game of connect-the-dots. You have dots on a page that are like pieces of data, and you connect them together to understand the relationship between them. 

For example, let’s say you want to know how much a car costs. You would need to know the make and model of the car, its age, its condition, and other factors. All of this information would be the data points or dots. When you connect these dots, you create a line that shows the relationship between each of the data points and the price of the car.



In [13]:
# Example-2 SequentialChain
# Define simple sequential chain using the two chains above: the second chain takes the output of the first chain as input
from langchain.chains import SequentialChain 
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True) # Note the second chain takes the output of the first chain as input

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("regression")
print(explanation)
     



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
Regression is a technique used to analyze the relationship between a dependent variable and one or more independent variables. It is used to predict a numerical value (such as a price, probability, etc.) from a given set of data points. Regression models are used to estimate the correlations between different variables, and can provide insights into how these variables interact.[0m
[33;1m[1;3m

Regression is like a game of connect-the-dots. You have dots on a page that are like pieces of data, and you connect them together to understand the relationship between them. 

For example, let’s say you want to know how much a car costs. You would need to know the make and model of the car, its age, its condition, and other factors. All of this information would be the data points or dots. When you connect these dots, you create a line that shows the relationship between each of the data points and the price of the car.



In [20]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 7,
)

texts = text_splitter.create_documents([explanation])

In [49]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

'Regression is like a game of connect-the-dots. You have dots on a page that are like pieces of'

In [22]:
# Individual text chunks can be accessed with "page_content"

texts[1].page_content

'of data, and you connect them together to understand the relationship between them.'

In [65]:
# Import and instantiate OpenAI embeddings
# what is embedding https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", openai_api_key='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')

In [50]:
# Turn the first text chunk into a vector with the embedding
# requires pip install tiktoken

query_result = embeddings.embed_documents(texts[0].page_content)
#print(query_result)

In [33]:
# Import and initialize Pinecone client
from tqdm.autonotebook import tqdm # jupyter notebook specific 

import os
import pinecone
from langchain.vectorstores import Pinecone

# Init pinecone
pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENV')  
)

In [73]:
# Upload vectors to Pinecone, indexing for search purposes
index_name='ai-assistant'
#pinecone.create_index(name=index_name, metric="cosine", dimension=1536) # need to upgrade pinecone

text_vec = Pinecone.from_documents(texts, embeddings, index_name=index_name)
     

In [74]:
# Do a simple vector similarity search

query = "What is magical about an regression?"
result_text_vec = text_vec.similarity_search(query)

print(result_text_vec)
     

[Document(page_content='Regression can also help us to make predictions. Let’s say you want to know how much a certain car'), Document(page_content='Regression can also help us to make predictions. Let’s say you want to know how much a certain car'), Document(page_content='The same concept can be used to look at other relationships. For instance, we can use regression to'), Document(page_content='The same concept can be used to look at other relationships. For instance, we can use regression to')]


In [81]:
# Agent example
# Code interpretor python language
# Import Python REPL tool and instantiate Python agent

# Experimental stage
from langchain_experimental.agents.agent_toolkits import create_python_agent
from langchain_experimental.tools.python.tool import PythonREPLTool

from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [80]:
agent_executor.run("What is the 10th fibonacci number?")



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: n = 10[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: a = 0[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: b = 1[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: for i in range(n):[0m
Observation: [36;1m[1;3mIndentationError("expected an indented block after 'for' statement on line 1", ('<string>', 1, 19, 'for i in range(n):\n', 1, -1))[0m
Thought:[32;1m[1;3m I need to indent the code after the for loop.
Action: Python_REPL
Action Input: for i in range(n):[0m
Observation: [36;1m[1;3mI

'1'