# AI Assistant : LangChain Tutorial

 Alejandro Ricciardi (Omegapy)  
 created date: 12/22/2023
 GitHub: https://github.com/Omegapy

Projects Description:
LangChain Tutorials based on Rabbitmetrics' YouTube video: https://www.youtube.com/watch?v=aywZrzNaKjs


### API Keys

This project you require API keys from:
OpenAI: https://openai.com/
Pinecone: https://www.pinecone.io/ 

In [22]:
# Load environment variables API Keys

from dotenv import load_dotenv,find_dotenv
load_dotenv(find_dotenv())
     

True

### Basic Query
Utilizing a Large Language Model (LLM)

In [2]:
# Run basic query with OpenAI wrapper

from langchain.llms import OpenAI
llm_davinci = OpenAI(model_name="text-davinci-003")
llm_davinci("explain large language models text-davinci-003")

'\n\nLarge language models are deep learning models that are trained on a large corpus of text to learn the underlying structure of language. They have been developed to better understand and generate natural language, and have been used in many applications such as machine translation, question answering, and summarization. They are especially useful in tasks such as sentiment analysis and text classification. By training on a large corpus of text, the model can learn more complex relationships between words, phrases, and concepts. This enables it to better understand the overall context of a sentence or document, and to produce more accurate results.'

### Chat Model
Utilizing a Large Language Model (LLM) as chat model

In [3]:
# import schema for chat messages and ChatOpenAI in order to query chat models GPT-3.5-turbo or GPT-4

from langchain.chat_models import ChatOpenAI # Ex: gpt-3.5-turbo

from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

In [4]:
chat_llm_gpt3 = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data ")
]
response=chat_llm_gpt3(messages)

print(response.content,end='\n')
     

Sure! Here's an example of a Python script that trains a neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(100, 2)
y = np.random.randint(2, size=(100, 1))

# Define the neural network model
model = Sequential()
model.add(Dense(10, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=100, batch_size=10)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")
```

In this script, we first generate simulated data using `numpy`. We then define a simple neural network model using `Sequential` from Keras. The model consists of two dense layers, with 10 units in the first layer and 1 unit in the output lay

### Prompts basic LLM


In [5]:
# Import prompt and define PromptTemplate

from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in a couple of lines
"""

prompt_first = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

#### Prompt Templates

In [6]:
# Run LLM with PromptTemplate

llm_davinci(prompt_first.format(concept="regression"))

'\nRegression is a type of machine learning algorithm that can be used to predict a continuous numerical value given a set of input variables. It is a supervised learning technique which uses historical data to build a model that can predict future outcomes based on new input data.'

#### Chains
Allows to chain (use) several LLMs 

In [7]:
# Import LLMChain and define chain with language model and prompt as arguments.

from langchain.chains import LLMChain

chain_one = LLMChain(llm=llm_davinci, prompt=prompt_first) # you can choose the llm model to response with

# Run the chain only specifying the input variable.
print(chain_one.run("regression"))


Regression is a type of supervised machine learning algorithm which is used to predict a continuous, numerical output for a given set of input data. It is used to estimate the relationship between a dependent variable (target) and one or more independent variables (features). The output generated by a regression model can be used to make predictions about future values of the target variable.


In [8]:
# Define a second prompt 

prompt_second = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words",
)

chain_two = LLMChain(llm=llm_davinci, prompt=prompt_second)

#### Example of Simple Sequential Chaining

For an example of (not simple) sequential chains see Sequential-Demo.ipynb program

In [9]:
# Example of SimpleSequentialChain
# Define simple sequential chain using the two chains above: the second chain takes the output of the first chain as input
from langchain.chains import SimpleSequentialChain 
overall_chain = SimpleSequentialChain(chains=[chain_one, chain_two], verbose=True) # Note the second chain takes the output of the first chain as input

# Run the chain specifying only the input variable for the first chain.
explanation = overall_chain.run("regression")
print(explanation)
     



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
Regression is a type of supervised machine learning algorithm that is used to predict a continuous numerical output. It uses data such as independent variables and their relationship to a dependent variable to create a model that can be used to make predictions about new data points.[0m
[33;1m[1;3m

Regression is a type of math that can be used to predict a number. It uses data that you have to try to guess what a new number might be.

To start, imagine you have a box full of marbles. The box contains different colors of marbles and you want to know how many of each color there are. To do this, you could use something called regression.

The first step is to figure out the relationships between the colors. To do this, you might look at how often each color appears in the box. You could also look at the pattern of colors in the box. From this, you can create a model to make predictions about the amount of each colo

### Chuck Files

In [10]:
# Import utility for splitting up texts and split up the explanation given above into document chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap  = 7,
)

texts = text_splitter.create_documents([explanation])

In [11]:
# Individual text chunks can be accessed with "page_content"

texts[0].page_content

'Regression is a type of math that can be used to predict a number. It uses data that you have to'

In [12]:
# Individual text chunks can be accessed with "page_content"

texts[1].page_content

'to try to guess what a new number might be.'

### Vector embedding

In [24]:
# Import and instantiate OpenAI embeddings
# what is embedding https://platform.openai.com/docs/guides/embeddings/what-are-embeddings

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

In [25]:
# Turn the first text chunk into a vector with the embedding
# requires pip install tiktoken

query_result = embeddings.embed_documents(texts[0].page_content)
#print(query_result)

### Vector embedding using Pinecone and LangChain

In [26]:
# Import and initialize Pinecone client
from tqdm.autonotebook import tqdm # jupyter notebook specific 

import os
import pinecone
from langchain.vectorstores import Pinecone

# Init pinecone
pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),  
    environment=os.getenv('PINECONE_ENV')  
)

In [27]:
# Upload vectors to Pinecone, indexing for search purposes
index_name='ai-assistant'
#pinecone.create_index(name=index_name, metric="cosine", dimension=1536) # need to upgrade pinecone to use it

text_vec = Pinecone.from_documents(texts, embeddings, index_name=index_name)
     

#### Query using vectors

In [28]:
# Do a simple vector similarity search

query = "What is magical about an regression?"
result_text_vec = text_vec.similarity_search(query)

print(result_text_vec)
     

[Document(page_content='Regression can also help us to make predictions. Let’s say you want to know how much a certain car'), Document(page_content='Regression can also help us to make predictions. Let’s say you want to know how much a certain car'), Document(page_content='Regression can also help us to make predictions. Let’s say you want to know how much a certain car'), Document(page_content='The same concept can be used to look at other relationships. For instance, we can use regression to')]


## Agent: Code interpreter python language

In [30]:
# Agent example
# Code interpreter python language
# Import Python REPL tool and instantiate Python agent

# Experimental stage
from langchain_experimental.agents.agent_toolkits import create_python_agent
from langchain_experimental.tools.python.tool import PythonREPLTool

from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)

In [31]:
agent_executor.run("What is the 10th fibonacci number?")



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: n = 10[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: a = 0[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: b = 1[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use a loop to calculate the fibonacci sequence and return the 10th number.
Action: Python_REPL
Action Input: for i in range(n):[0m
Observation: [36;1m[1;3mIndentationError("expected an indented block after 'for' statement on line 1", ('<string>', 1, 19, 'for i in range(n):\n', 1, -1))[0m
Thought:[32;1m[1;3m I need to indent the code after the for loop.
Action: Python_REPL
Action Input: for i in range(n):[0m
Observation: [36;1m[1;3mI

KeyboardInterrupt: 