### What is Langchain ?
#### An open source framework that allows  developers to combine LLMs like GPT-4 with external sources of computation and data.

#### Langchain Framework is offered as Python and TypeScript Package

### Why we need Langchain ?

we're going to see why the popularity of the framework is exploding right now especially after the introduction of
gpt4 in March 2023 to understand what
need Lang chain fills let's have a look
at a practical example so by now we all
know that chat typically or tpt4 has an
impressive general knowledge we can ask
it about almost anything and we'll get a
pretty good answer
suppose you want to know something
specifically from your own data your own
document it could be a book a PDF file a
database with proprietary information
link chain allows you to connect a large
language model like GPT-4 to your own
sources of data 

In [1]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

True

In [5]:
import warnings
warnings.filterwarnings("ignore")

#### we are going to start with LLM Wrappers

In [6]:
from langchain.llms import OpenAI
llm = OpenAI(model_name="gpt-3.5-turbo")
llm("explain large language models in one sentence")

'Large language models are highly advanced artificial intelligence systems capable of generating human-like text by analyzing a vast amount of data.'

In [7]:
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [9]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo",temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a python script that trains a neural network on simulated data")
]
response = chat(messages)

In [10]:
print(response.content,end="\n")

Sure! Here's an example of a Python script that trains a neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
np.random.seed(0)
X = np.random.rand(100, 2)
y = np.random.randint(2, size=100)

# Define the neural network model
model = Sequential()
model.add(Dense(10, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=100, batch_size=10)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f'Loss: {loss}, Accuracy: {accuracy}')
```

In this script, we first generate simulated data using `numpy.random.rand` and `numpy.random.randint` functions. The input data `X` is a 2-dimensional array of random numbers between 0 and 1, and the target labels `y` are binary values (0 or 1).

Next, we

### Concept 2 : Prompts

In [14]:
from langchain import PromptTemplate

template = """
You are an expert data scientist with expertise in building deep learning models.
Explain the concept of {concept} in couple of lines
"""
prompt = PromptTemplate(
    input_variables=["concept"],
    template=template
)

In [15]:
prompt

PromptTemplate(input_variables=['concept'], template='\nYou are an expert data scientist with expertise in building deep learning models.\nExplain the concept of {concept} in couple of lines\n')

In [16]:
llm(prompt.format(concept="backpropagation"))

'Backpropagation is a widely used algorithm in deep learning that enables the update of model parameters by calculating gradients. It works by propagating the error information from the output layer back to the input layer, allowing the model to learn from its mistakes and adjust its weights and biases accordingly to improve its overall performance.'

In [17]:
llm(prompt.format(concept="autoencoders"))

'Autoencoders are unsupervised learning models that learn to encode input data into a lower-dimensional representation and then decode it back to reconstruct the original input. They consist of an encoder network that compresses the data and a decoder network that reconstructs it, with the goal being to minimize the difference between the original input and the reconstructed output. Autoencoders can be used for tasks like data compression, denoising, and anomaly detection.'

### Concept 3: Chains


In [18]:
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain only specifying the input variables
print(chain.run("autoencoders"))

Autoencoders are a type of artificial neural network used for unsupervised learning that aim to encode and then reconstruct input data. By compressing the input data into a lower-dimensional code and then decoding it back to its original form, autoencoders learn to capture the most important features of the data, making them useful for tasks like dimensionality reduction, anomaly detection, and image generation.


In [23]:
## Building a sequential chain

second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I am a 5 year old in 500 words",
)

second_chain = LLMChain(llm=llm, prompt=second_prompt)

In [24]:
from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, second_chain], verbose=True)

# Run the chain specifying only the input variables for the first chain.Config
explaination = overall_chain.run("autoencoders")
print(explaination)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mAutoencoders are a type of artificial neural network that aim to learn efficient representations of the input data by reconstructing it. They consist of an encoder that compresses the data into a lower-dimensional latent space and a decoder that reconstructs the original data from the compressed representation. Autoencoders are useful for tasks such as dimensionality reduction, anomaly detection, and generating new data.[0m
[33;1m[1;3mAutoencoders are like magical machines that can learn how to draw and recognize pictures. They try to understand what the pictures look like and then draw them on their own! Let's imagine you have a picture of a cute puppy, but it's too big and complicated. The autoencoder helps you make the picture smaller and simpler by squeezing it.

So, the autoencoder has two important parts: the "encoder" and the "decoder." The encoder squishes the picture and makes it smaller by looking careful

#### Let's split the text into chunks to store it into a pinecone 

In [25]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=0,
)

texts = text_splitter.create_documents([explaination])

In [26]:
texts

[Document(page_content='Autoencoders are like magical machines that can learn how to draw and recognize pictures. They try'),
 Document(page_content="to understand what the pictures look like and then draw them on their own! Let's imagine you have a"),
 Document(page_content="picture of a cute puppy, but it's too big and complicated. The autoencoder helps you make the"),
 Document(page_content='picture smaller and simpler by squeezing it.'),
 Document(page_content='So, the autoencoder has two important parts: the "encoder" and the "decoder." The encoder squishes'),
 Document(page_content='the picture and makes it smaller by looking carefully at all the details. It learns what parts of'),
 Document(page_content="the picture are important, like the puppy's eyes or its fluffy tail. Then, it takes all this"),
 Document(page_content='important information and puts it into a special box.'),
 Document(page_content='Now, the decoder does the opposite. It takes the information from the box and 

In [28]:
#splitting the texts
# extract plain text of page content
texts[0].page_content

'Autoencoders are like magical machines that can learn how to draw and recognize pictures. They try'

In [29]:
# turn this into embedding .. Let's Use OpenAI Model
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name="ada")

In [32]:
# with openai model we can call embed query as follows 
query_result = embeddings.embed_query(texts[0].page_content)
query_result
#down below is the vector representation of the text/embedding.

[-0.03769625485434404,
 0.0024221035119735713,
 -0.0010334417825770546,
 -0.004520378702609853,
 0.0029309765550992184,
 0.02878767354760108,
 -0.014155922031716657,
 -0.027624535429406058,
 -0.011975038292931648,
 -0.031959870108719514,
 -0.0032316740824861424,
 0.02799462440732147,
 0.004774815107757351,
 -0.026078091002844736,
 0.0072365708439142504,
 0.01617819687376835,
 0.004821076229996777,
 0.012464085107550303,
 -0.017592467995820756,
 -0.0031705432306588103,
 -0.014380619443349648,
 -0.0007385267790547328,
 -0.016574720512585548,
 -0.0233288546040905,
 -0.006807002947647216,
 -0.0004196550148441912,
 0.00863101550462901,
 -0.034603366001221394,
 0.0009367890059169718,
 0.010137807939837658,
 0.012113821193988622,
 -0.007216744848237912,
 -0.002179232270970604,
 -0.053266448939435704,
 -0.026990096117182373,
 -0.02311737545423101,
 -0.007137439934209951,
 -0.028100364913573823,
 0.006773959155858681,
 -0.014539229271405569,
 0.013237307320830966,
 0.01250373709890298,
 0.00422