# LangChain Overview 

[LangChain](https://python.langchain.com/docs/get_started/introduction) is a framework for developing applications powered by language models. 

The main value props of LangChain are:

1. **Components**: abstractions for working with language models, along with a collection of implementations for each abstraction. 
2. **Off-the-shelf chains**: a structured assembly of components for accomplishing specific higher-level tasks

Off-the-shelf chains make it easy to get started. For complex applications, components make it easy to customize existing chains and build new ones.

In [1]:
!pip install -q install openai langchain huggingface_hub

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m28.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# load envirenmental variables from .env file 
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) 

There are two types of language models, which in LangChain are called:

* LLMs: this is a language model which takes a string as input and returns a string
* ChatModels: this is a language model which takes a list of messages as input and returns a message




# 1.1. Build first LLM model

In [34]:
from langchain.llms import OpenAI

# Similar au call API directly
llm = OpenAI(model_name = 'text-davinci-003',
             temperature=0.9,
             max_tokens = 256
             )

text = "Why did the chicken cross the road?"
output = llm(text)
print(output)



To get to the other side.


# 1.2. Chat model

A *chat model* takes a list of ChatMessages as an input and returns a ChatMessage.

There are tree type of the messages :

* `SystemChatMessage` : A chat message representing information that should be instructions to the AI system.

* `HumanChatMessage` : A chat message representing information coming from a human interacting with the AI system.   

* `AIChatMessage` : A chat message representing information coming from the AI system.

In [4]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

from langchain.chat_models import ChatOpenAI

In [6]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script thet trains a neural network on simulated data")
]
response=chat(messages)


In [8]:
print(response.content, end='\n')

Sure! Here's an example of a Python script that trains a neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(100, 2)
y = np.random.randint(2, size=100)

# Define the neural network model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10, batch_size=10)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")
```

In this script, we first generate simulated data using `np.random.rand` and `np.random.randint`. The input data `X` is a 2-dimensional array of random values between 0 and 1, and the target labels `y` are randomly assigned binary values.

Next, we 

# 2. Prompts

A "prompt" refers to the input to the model.

In [10]:
from langchain import PromptTemplate

template="""
You are an expert data scientist with an expertise in building deep leraning models.
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template
)



In [11]:
prompt

PromptTemplate(input_variables=['concept'], output_parser=None, partial_variables={}, template='\nYou are an expert data scientist with an expertise in building deep leraning models.\nExplain the concept of {concept} in a couple of lines\n', template_format='f-string', validate_template=True)

In [13]:
llm(prompt.format(concept="regularization"))

'\nRegularization is a technique used to prevent overfitting in deep learning models by penalizing large weights and encouraging generalization to unseen data. This is done by adding a “regularization term” to the loss function, which penalizes overly large weights and encourages the model to use simpler models with fewer parameters.'

In [14]:
llm(prompt.format(concept="autoencoder"))

'\nAn autoencoder is a type of artificial neural network used for unsupervised learning, which attempts to reconstruct its input in an efficient and compressed form. It uses an encoder to learn an efficient representation of input data in a lower-dimensional latent space, and a decoder to reconstruct the input data from the encoded representation. They are used for dimensionality reduction, image denoising, and can also be used as generative models.'

# 3. Chains

A chain is just an end-to-end wrapper around multiple individual components.

In [15]:
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

print(chain.run("autoencoder"))


Autoencoders are a type of artificial neural network that can be used to learn important features from unlabeled data in an unsupervised manner. They are composed of an encoder, which compresses inputs into a low-dimensional representation, and a decoder, which reconstructs the representation back into the input space. This process helps capture important features of the data and can be used for dimensionality reduction, feature extraction, feature learning, and more.


3.1. Nested prompts

In [19]:
second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 500 words"
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [20]:
from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

explanation = overall_chain.run("autoencoder")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
An autoencoder is a type of artificial neural network used for unsupervised learning, which can be used to represent complicated input data such as images, videos, or speech signals as simpler representations called "encodings". This process is known as feature extraction or dimensionality reduction. Autoencoders can also be used for anomaly detection and generative modeling.[0m
[33;1m[1;3m

An autoencoder is like a translator. It takes information in a complicated form, such as a picture, a video or a sound, and it changes it into a simpler form. This is known as “encoding” the information. It’s like a magical machine that can take a complex sentence and turn it into a few simple words!

To use an autoencoder, we give it information in a complicated form like a picture or sound. Then it “encodes” it, or translates it into something simpler. For example, it might take a complex picture which is made up of lots of 

In [23]:
from langchain import text_splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap = 0
)

texts = text_splitter.create_documents([explanation])

In [24]:
texts

[Document(page_content='An autoencoder is like a translator. It takes information in a complicated form, such as a picture,', metadata={}),
 Document(page_content='a video or a sound, and it changes it into a simpler form. This is known as “encoding” the', metadata={}),
 Document(page_content='information. It’s like a magical machine that can take a complex sentence and turn it into a few', metadata={}),
 Document(page_content='simple words!', metadata={}),
 Document(page_content='To use an autoencoder, we give it information in a complicated form like a picture or sound. Then', metadata={}),
 Document(page_content='it “encodes” it, or translates it into something simpler. For example, it might take a complex', metadata={}),
 Document(page_content='picture which is made up of lots of little coloured squares and turn it into a few numbers which', metadata={}),
 Document(page_content='describe the image.', metadata={}),
 Document(page_content='Autoencoders can also be used for “anomaly d

In [25]:
texts[0].page_content

'An autoencoder is like a translator. It takes information in a complicated form, such as a picture,'

In [29]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.5.1


In [27]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

                    model_name was transferred to model_kwargs.
                    Please confirm that model_name is what you intended.


In [30]:
query_result = embeddings.embed_query(texts[0].page_content)
query_result

[-0.0425466536451634,
 0.0010285452607304234,
 0.010034588545590245,
 0.004037249611533585,
 0.011432741140528255,
 0.011084875414215233,
 -0.009593066232351784,
 -0.024069632012457716,
 0.005104260704373378,
 -0.04096787942870548,
 0.003053859872598454,
 0.04364377020423489,
 0.01692500617499145,
 -0.002015279960923831,
 0.02807008879637999,
 -0.01739328631565108,
 0.000556501529338229,
 0.009118096402006233,
 -0.01141936176115641,
 -0.009907483510235193,
 -0.022290165242777073,
 0.028846096525237102,
 -0.007191455062251508,
 -0.024310462703795978,
 -0.00854946905341272,
 0.021259947443209852,
 0.014195597881974659,
 -0.03106708653947873,
 0.003017066346495249,
 -0.005335056395521493,
 0.018329848459615383,
 -0.03566961539513404,
 -0.010522938686630165,
 -0.047818158919991145,
 -0.03101356902199135,
 -0.001459196362567999,
 0.01673769300114057,
 -0.03079949708939678,
 -0.003187654364808798,
 0.009258579885410607,
 0.03409084117867612,
 0.0009031129388351401,
 0.003930213645236299,
 -0

# 4. Agents
Some applications will require not just a predetermined chain of calls to LLMs/other tools,but potentially an unknown chain that depends on the user's input.   

In these types of chains, there is a “agent” which *has access to a suite of tool*.  

Depending on the user input, the agent can then decide which, if any, of these tools to call.

In [32]:
from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

agent_executor = create_python_agent(
    llm=OpenAI(temperature=0, max_tokens=1000),
    tool=PythonREPLTool(),
    verbose=True
)



In [33]:
agent_executor.run("Find the roots (zeros) if quadratic function 3 * x**2 + 2*x -1")



[1m> Entering new AgentExecutor chain...[0m




[32;1m[1;3m I need to solve a quadratic equation
Action: Python_REPL
Action Input: import numpy as np[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I can use numpy to solve the equation
Action: Python_REPL
Action Input: np.roots([3,2,-1])[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: The roots of the equation are -1 and 0.33333[0m

[1m> Finished chain.[0m


'The roots of the equation are -1 and 0.33333'