## QuickStart Tutorial for Beginners
https://www.youtube.com/watch?v=aywZrzNaKjs&t=656s

## 1. Unpacking Langchain
Installation of libraries in the virtual environment (venv) + Check the installation

In [12]:
# pip install python-dotenv
# pip install langchain
# pip install langchain-openai
# pip install pinecone
# pip install openai
!pip show python-dotenv langchain langchain-openai pinecone openai

Name: python-dotenv
Version: 0.21.0
Summary: Read key-value pairs from a .env file and set them as environment variables
Home-page: https://github.com/theskumar/python-dotenv
Author: Saurabh Kumar
Author-email: me+github@saurabh-kumar.com
License: BSD-3-Clause
Location: C:\Users\CGM\anaconda3\Lib\site-packages
Requires: 
Required-by: anaconda-cloud-auth, pydantic-settings
---
Name: langchain
Version: 0.3.11
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: C:\Users\CGM\anaconda3\Lib\site-packages
Requires: aiohttp, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: langchain-community
---
Name: langchain-openai
Version: 0.2.12
Summary: An integration package connecting OpenAI and LangChain
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: C:\Users\CGM\anac

In [13]:
# Import the necessary functions from the dotenv module.
# `load_dotenv` is used to load environment variables from a `.env` file.
# `find_dotenv` automatically searches for the `.env` file in the current
from dotenv import load_dotenv,find_dotenv

# Locate the `.env` file using `find_dotenv` and load the environment variables
# defined in it into the current system environment.
load_dotenv(find_dotenv())

True

## 2. LLM Wrappers

In [10]:
# The ChatOpenAI class facilitates interactions with OpenAI's chat-based language 
# models, enabling the development of applications that require conversational AI 
# capabilities
from langchain.chat_models import ChatOpenAI
# Initialize the ChatOpenAI model with specific parameters
# temperature=0: Sets the randomness of the model's output.
# As of December 2024, the model 'text-davinci-003' is deprecated and no longer supported.
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
# Generate a response using the 'invoke' method
# invoke method is used to send a prompt to the model and retrieve its response
response = llm.invoke("Explain large language models in one sentence")
print(response)

content='Large language models are advanced artificial intelligence systems that are trained on vast amounts of text data to generate human-like text and perform various natural language processing tasks.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 30, 'prompt_tokens': 15, 'total_tokens': 45, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-d826fd37-be6c-4373-ac79-7cd0779c2767-0'


In [11]:
# This module is essential for standardizing and structuring interactions between 
# components in applications that use LangChain. By defining clear schemas for 
# messages and events, it facilitates consistent communication between language 
# models, users, and other system elements, ensuring that data is handled in a 
# consistent and predictable manner.
from langchain.schema import (
    # Represents a message generated by an AI model
    AIMessage,
    # Represents a message sent by a human user
    HumanMessage,
    # Represents a message that provides system-level instructions or context 
    # to guide the conversation
    SystemMessage
)

In [4]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)
messages = [
    SystemMessage(content="You are an expert data scientist"),
    HumanMessage(content="Write a Python script that trains a neural network on simulated data")
]
response=chat.invoke(messages)

In [5]:
print(response.content,end='\n')

Sure, here is an example Python script that trains a simple neural network on simulated data using the TensorFlow library:

```python
import numpy as np
import tensorflow as tf

# Generate simulated data
np.random.seed(0)
X = np.random.rand(1000, 2)
y = np.sum(X, axis=1)

# Define the neural network architecture
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(2,)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X, y, epochs=10, batch_size=32)

# Evaluate the model
loss = model.evaluate(X, y)
print(f'Final loss: {loss}')
```

In this script, we first generate some simulated data where the target variable is the sum of the input features. We then define a simple neural network with two hidden layers and train it on the simulated data using the mean squared error loss function and the Adam optimiz

## 3. Promt Templates
### Step-by-step code

In [6]:
from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models.
Explain the concept of {concept} in a couple of lines
"""

prompt = PromptTemplate(
    input_variables=["concept"],
    template=template,
)

# Initialize ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)

# Format the prompt using the template
formatted_prompt = prompt.format(concept="regularization")

# Wrap the formatted prompt in a HumanMessage
messages = [HumanMessage(content=formatted_prompt)]

# Use the invoke method to send the message to the LLM
response = llm.invoke(messages)

# Print the response content
print(response.content)

Regularization is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function that discourages overly complex models. This helps to improve the generalization performance of the model on unseen data.


### Simplified code

In [7]:
# Define the prompt template
prompt = PromptTemplate(
    input_variables=["concept"],
    template="""
    You are an expert data scientist with an expertise in building deep learning models.
    Explain the concept of {concept} in a couple of lines.
    """
)

# Initialize the ChatOpenAI model
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)

# Directly invoke the model with formatted prompt
response = llm.invoke([
    HumanMessage(content=prompt.format(concept="autoencoder"))
])

# Print the response content
print(response.content)

An autoencoder is a type of neural network that learns to encode input data into a lower-dimensional representation and then decode it back to its original form. It is commonly used for dimensionality reduction, feature learning, and data compression tasks.


## 4. Chains

In [8]:
# Import the LLMChain class from the LangChain library
from langchain.chains import LLMChain

# Initialize the LLMChain with the language model (llm) and the prompt template
chain = LLMChain(llm=llm, prompt=prompt)

# Run the chain by specifying the input variable 'concept' as "autoencoder"
# The input is passed as a dictionary where the key is the variable name used in the prompt
print(chain.run({"concept": "autoencoder"}))

  chain = LLMChain(llm=llm, prompt=prompt)
  print(chain.run({"concept": "autoencoder"}))


An autoencoder is a type of neural network that learns to encode input data into a lower-dimensional representation and then decode it back to its original form. It is commonly used for dimensionality reduction, feature learning, and data compression tasks.


In [9]:
# Create a PromptTemplate with a specific input variable 'ml_concept'
# The template instructs the model to explain the given machine learning
# concept in simple terms, as if to a 5-year-old, with a 200-word limit
second_prompt = PromptTemplate(
    input_variables=["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me like I'm five in 200 words",
)

# Initialize a new LLMChain using the language model (llm) and the second 
# prompt template
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [10]:
# Import the SimpleSequentialChain class from the LangChain library
from langchain.chains import SimpleSequentialChain

# Combine two chains (chain and chain_two) into a single sequential chain
# The output of the first chain will serve as input for the second chain
# 'verbose=True' enables logging to display intermediate outputs during execution
overall_chain = SimpleSequentialChain(chains=[chain, chain_two],verbose=True)

# Run the sequential chain, specifying only the input variable for the
# first chain.  Here, "autoencoder" is provided as the input for 'concept'
# in the first chain
explanation = overall_chain.run("autoencoder")

# Print the final output produced after running the combined chains
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mAn autoencoder is a type of artificial neural network that learns to encode input data into a compact representation, known as the latent space, and then decode it back to reconstruct the original input. It is commonly used for dimensionality reduction, feature learning, and data denoising.[0m
[33;1m[1;3mAn autoencoder is like a magic box that can take a picture of a cat, squish it into a tiny picture, and then turn it back into the original cat picture. It's like a puzzle that can take a big piece and turn it into a small piece, and then put it back together perfectly. 

Imagine you have a big box of toys, but you want to make it smaller so it's easier to carry. The autoencoder can help you squish all the toys into a tiny box, and then when you want to play with them again, it can make them big again. It's like a superhero power that can make things smaller and then big again.

People use autoencoders to help comp

## 5. Embeddings and VectorStores

In [11]:
# Import the RecursiveCharacterTextSplitter class from the LangChain library
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize the text splitter with specified chunk size and overlap
# 'chunk_size=100' sets the maximum number of characters per chunk
# 'chunk_overlap=0' ensures no overlapping characters between consecutive chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap = 0,
)

# Split the input text (explanation) into smaller chunks using the text splitter
# The input is provided as a list containing the 'explanation' text
texts = text_splitter.create_documents([explanation])

In [12]:
texts

[Document(metadata={}, page_content='An autoencoder is like a magic box that can take a picture of a cat, squish it into a tiny picture,'),
 Document(metadata={}, page_content="and then turn it back into the original cat picture. It's like a puzzle that can take a big piece"),
 Document(metadata={}, page_content='and turn it into a small piece, and then put it back together perfectly.'),
 Document(metadata={}, page_content="Imagine you have a big box of toys, but you want to make it smaller so it's easier to carry. The"),
 Document(metadata={}, page_content='autoencoder can help you squish all the toys into a tiny box, and then when you want to play with'),
 Document(metadata={}, page_content="them again, it can make them big again. It's like a superhero power that can make things smaller"),
 Document(metadata={}, page_content='and then big again.'),
 Document(metadata={}, page_content="People use autoencoders to help computers learn about pictures, sounds, and other things. It's like"

In [13]:
# Access the content of the first text chunk created by the text splitter
# 'texts' is a list of document objects, and 'page_content' contains the
# actual text of each chunk
texts[0].page_content

'An autoencoder is like a magic box that can take a picture of a cat, squish it into a tiny picture,'

In [14]:
# Import the OpenAIEmbeddings class from the LangChain OpenAI module
from langchain_openai import OpenAIEmbeddings

# Initialize the OpenAIEmbeddings object with a specific embedding model
# 'model="text-embedding-ada-002"' specifies the model to use for generating embeddings
# This model converts text into numerical vectors suitable for further processing
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

In [20]:
# Generate an embedding (numerical vector) for the given text using the 
# embedding model'embed_query' processes the content of the first text 
# chunk (texts[0].page_content).  This converts the text into a vector
# representation for use in similarity searches or other tasks
query_results = embeddings.embed_query(texts[0].page_content)
query_results

[-0.034905511885881424,
 -0.0024193432182073593,
 0.005823166109621525,
 0.005330926273018122,
 -0.0016466601518914104,
 0.01159275509417057,
 -0.024190083146095276,
 -0.05055003985762596,
 -0.0054916576482355595,
 -0.037745099514722824,
 0.011117258109152317,
 0.04428151249885559,
 0.01768045872449875,
 -0.0024678974878042936,
 0.02108260802924633,
 -0.0036365489941090345,
 -0.005079783499240875,
 0.015457007102668285,
 -0.008103543892502785,
 -0.01435867603868246,
 -0.013213464058935642,
 0.01261741854250431,
 -0.0037671432364732027,
 -0.02885129489004612,
 -0.018631452694535255,
 0.011378446593880653,
 0.006405817810446024,
 -0.021980024874210358,
 0.0017228401266038418,
 0.007654835004359484,
 0.021471042186021805,
 -0.034905511885881424,
 -0.019234197214245796,
 -0.05132690817117691,
 -0.04243310168385506,
 0.014572984538972378,
 0.02607867680490017,
 -0.018805578351020813,
 0.013575109653174877,
 -0.003274903167039156,
 0.026560870930552483,
 0.008043269626796246,
 -0.00870628654

In [19]:
# Provides access to environment variables
import os  
# Imports Pinecone library for vector database operations
from pinecone import Pinecone, ServerlessSpec
# Imports LangChain integration for Pinecone
from langchain.vectorstores import Pinecone as LangChainPinecone

# Initialize a Pinecone instance using the API key
# The API key is retrieved securely from an environment variable named "PINECONE_API_KEY"
pc = Pinecone(
    api_key=os.getenv("PINECONE_API_KEY")
)

In [20]:
# Define the name of the Pinecone index to be used or created
index_name = "langchain-quickstart"

In [22]:
# Create a new index in Pinecone with the specified configuration
pc.create_index(
    # Name of the index to be created
    name=index_name,
    # Dimensionality of the embeddings 
    # (e.g., 'text-embedding-ada-002' outputs 1536-dimensional vectors)
    dimension=1536,
    # Similarity metric to use for comparing vectors
    # (cosine similarity in this case)
    metric="cosine",
    # Define serverless deployment settings
    spec=ServerlessSpec(
        # Cloud provider where the index will be hosted
        cloud="aws",
        # Specific region in the cloud provider
        region="us-east-1"
    ) 
)

In [23]:
# Initialize the OpenAIEmbeddings object using the specified model
# 'text-embedding-ada-002' is an OpenAI embedding model that converts 
# text into 1536-dimensional vectors
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

In [24]:
# Load documents into the Pinecone vector database using LangChain's integration
vectorstore = LangChainPinecone.from_documents(
    # The preprocessed documents or text chunks to be embedded and stored
    texts,
    # The embedding model used to generate vector representations
    embeddings,
    # The name of the Pinecone index where vectors will be stored
    index_name=index_name
)

In [25]:
# Perform a similarity search in the Pinecone vector database
# The input query to search for similar documents
query = "What is magical about auto encoder?"
# Search for documents similar to the query based on embeddings
result = vectorstore.similarity_search(query)

In [26]:
# Mostrar resultados
print(result)

[Document(metadata={}, page_content='An autoencoder is like a magic box that can take a picture of a cat, squish it into a tiny picture,'), Document(metadata={}, page_content='autoencoder can help you squish all the toys into a tiny box, and then when you want to play with'), Document(metadata={}, page_content="People use autoencoders to help computers learn about pictures, sounds, and other things. It's like"), Document(metadata={}, page_content='teaching a computer how to play with toys and then put them away neatly. Autoencoders are really')]


## 6. Agents
langchain.agents has been replaced with langgraph