# Introduction to Langchain and its Capabilities

LangChain is an open-source framework designed to simplify the development of applications powered by large language models (LLMs). It provides a high-level API that allows developers to chain together multiple LLMs, data sources, and tools to create complex applications. Here are some key features and capabilities of LangChain:

# Key Features
Modular Architecture: LangChain's modular design allows developers to easily swap out components, such as language models, data sources, and processing steps, without disrupting the entire application. This flexibility enables rapid experimentation and iteration.

Unified Interface: Despite supporting multiple LLMs from various providers, LangChain offers a consistent and unified interface, abstracting away the complexities of interacting with different models.

Memory Management: LangChain simplifies the management of conversational memory, enabling applications to maintain context and continuity across interactions. This feature is particularly valuable for building chatbots, virtual assistants, and other conversational AI systems.

# Chain Feature in Langchain

The chain feature in LangChain allows you to create a sequence of operations that can be performed in a linear or branching manner. This is especially useful when you want to combine multiple language model queries, data retrieval operations, or processing steps into a single, cohesive workflow. Here’s an overview of the chain feature and its capabilities:

# Chain Feature Overview
Sequential Execution: Chains enable you to execute a series of steps in a defined order. Each step takes the output from the previous step as its input, allowing for complex workflows to be built in a modular and reusable manner.

Branching and Conditional Logic: Chains can include conditional logic and branching, enabling you to create more dynamic and adaptable workflows. Depending on certain conditions, the chain can follow different paths to handle various scenarios.

Integration of Multiple Components: Chains can integrate various components, such as different language models, data sources, and processing functions. This allows you to leverage the strengths of multiple tools and models in a single workflow.

Error Handling: Chains can include error handling mechanisms to ensure robustness and reliability. If an error occurs at any step in the chain, you can define how to handle it and whether to continue or abort the workflow.


# Need for Langchain and RAG

Limitation of LLM Training: While LLMs are trained on extensive public data, they lack training on proprietary, private data specific to individual problems.

# Challenges of Fine-Tuning LLMs:

Costly: Training LLMs requires significant financial resources.
Inflexibility: Once trained, updating LLMs with new information is expensive and challenging.
Lack of Observability: It's not clear how LLMs arrive at their conclusions when posed with a query.
Advantages of Retrieval-Augmented Generation (RAG):

No Training Required: RAG eliminates the need for costly retraining.
Up-to-Date Information: Data is retrieved in real-time from sources, ensuring the information is current.
Transparency: By showing the retrieved documents, RAG provides a more transparent and trustworthy process.
LlamaIndex and Context Augmentation:

Flexibility: Langchain allows LLMs to be used in various applications like chatbots, auto-complete, etc., without restrictions.
Enhanced Relevance: Langchain enhances the relevance of LLM responses by incorporating specific, contextual data from diverse sources.

# Retrieval Augmented Generation (RAG)
LLMs are trained on enormous bodies of data but they aren't trained on your data. Retrieval-Augmented Generation (RAG) solves this problem by adding your data to the data LLMs already have access to. You will see references to RAG frequently in this documentation.

In RAG, your data is loaded and prepared for queries or "indexed". User queries act on the index, which filters your data down to the most relevant context. This context and your query then go to the LLM along with a prompt, and the LLM provides a response.

# Significance of LangChain
Primary Function : LangChain is a Python-based library that enables the development of custom NLP applications using large language models.
Features:	Stands out for its versatility and adaptability in building robust applications with LLMs
Key Features:	Supports GPT-2, GPT-3, and T5 LLMs – Provides tokenization, text generation, and question-answering capabilities – Ideal for creating chatbots and summarizing lengthy documents.

In [1]:
#Importing the needed libraries
from langchain_community.document_loaders.csv_loader import CSVLoader

In [3]:
#LangChain implements a CSV Loader that will load CSV files into a sequence of Document objects. 
#Each row of the CSV file is translated to one document.
file_path = r"C:\Users\aminu\OneDrive\Documents\Fashion Dataset v2.csv"
loader = CSVLoader(file_path=file_path)
data = loader.load()
for record in data[:2]:
    print(record)

page_content='p_id: 17048614
name: Khushal K Women Black Ethnic Motifs Printed Kurta with Palazzos & With Dupatta
products: Kurta, Palazzos, Dupatta
price: 5099
colour: Black
brand: Khushal K
img: http://assets.myntassets.com/assets/images/17048614/2022/2/4/b0eb9426-adf2-4802-a6b3-5dbacbc5f2511643971561167KhushalKWomenBlackEthnicMotifsAngrakhaBeadsandStonesKurtawit7.jpg
ratingCount: 4522
avg_rating: 4.418398939
description: Black printed Kurta with Palazzos with dupatta <br> <br> <b> Kurta design:  </b> <ul> <li> Ethnic motifs printed </li> <li> Anarkali shape </li> <li> Regular style </li> <li> Mandarin collar,  three-quarter regular sleeves </li> <li> Calf length with flared hem </li> <li> Viscose rayon machine weave fabric </li> </ul> <br> <b> Palazzos design:  </b> <ul> <li> Printed Palazzos </li> <li> Elasticated waistband </li> <li> Slip-on closure </li> </ul>Dupatta Length 2.43 meters Width:&nbsp;88 cm<br>The model (height 5'8) is wearing a size S100% Rayon<br>Machine wash
p_att

In [8]:
# Read the OpenAI API key
import os
filepath = r"C:\Users\aminu\OneDrive\Documents\open_ai_secret_key_2.txt"
with open(filepath,'r') as f:
  openai_api_key = " ".join(f.readlines())
os.environ["OPENAI_API_KEY"] = openai_api_key

In [6]:
from langchain_openai.embeddings import OpenAIEmbeddings

In [9]:
# Create an instance of OpenAIEmbeddings
# The model parameter specifies the name of the model to use
embedding = OpenAIEmbeddings(model = "text-embedding-ada-002")
"""
Documentation:
The OpenAIEmbeddings class is used to generate embeddings (numerical representations of text) using OpenAI's language model. 
In this instance, the 'text-embedding-ada-002' model is specified. This model is part of OpenAI's suite of text embedding models, 
designed to provide high-quality embeddings for a wide range of natural language processing tasks.
Attributes:
- model: A string representing the name of the embedding model to be used.

In [12]:
pip install chromadb

Collecting chromadb
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp311-cp311-win_amd64.whl.metadata (262 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.20.1-cp311-cp311-win_amd64.whl.metadata (4.7 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.30.0-py3-none-any.whl.metadata (1.6 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.30.0-py3-none-any.whl.metadata (2.4 kB)
Collecting opentele

In [13]:
from langchain_community.vectorstores import Chroma

In [14]:
# Create a vector store instance using the Chroma class
# 'documents' parameter specifies the data to be embedded
# 'embedding' parameter specifies the embedding model instance to be used
# 'persist_directory' parameter specifies the directory to save the embedded vectors
vectorstore = Chroma.from_documents(documents = data, 
                                    embedding = embedding, 
                                    persist_directory = "./Myntra_dataset_embedding")
"""
Documentation:
The Chroma class is used to create a vector store from a set of documents using specified embeddings. 
This allows for efficient storage and retrieval of vector representations of the documents.


In [15]:
# Create a retriever instance from the vector store using the Chroma class
# 'search_type' parameter specifies the type of search to be used (e.g., 'mmr' for Maximal Marginal Relevance)
# 'search_kwargs' parameter specifies additional keyword arguments for the search, including:
# - 'k': The number of top results to return

retriever = vectorstore.as_retriever(search_type = 'mmr', 
                                     search_kwargs = {'k': 3, 
                                                      'lambda_mult': 0.7})

In [43]:
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.runnables import RunnableParallel
from langchain_core.output_parsers import StrOutputParser
from langchain.memory import ConversationSummaryMemory
from operator import itemgetter

In [85]:
# Define a template string for a professional conversation between a human and an AI
# The AI is designed to be concise and provide specific details from its context

TEMPLATE = '''
The following is a professional conversation between a human and an AI. 
The AI is concise and provides lots of specific details from its context. 

Current conversation : 
{message_log}

Human:
{question}

AI:

To answer the question, use only the following context:
{context}

'''

prompt_template = PromptTemplate.from_template(TEMPLATE)

"""
Documentation:
The TEMPLATE string defines the structure for a professional conversation between a human and an AI. 
It includes placeholders for the message log, the human's question, and the context to be used by the AI in its response.
Parameters:
- message_log: A placeholder for the current conversation's message log.
- question: A placeholder for the human's question.
- context: A placeholder for the specific context to be used by the AI in its response.

The PromptTemplate class is used to create a template instance from the defined template string.


In [86]:
# Create a chat instance using the ChatOpenAI class
# 'model_name' parameter specifies the name of the model to use
# 'model_kwargs' parameter provides additional keyword arguments for the model (e.g., 'seed' for setting a seed value)
# 'max_tokens' parameter specifies the maximum number of tokens to generate in the response
chat = ChatOpenAI(model_name = 'gpt-4', 
                  model_kwargs = {'seed':365},
                  max_tokens = 250)

  if await self.run_code(code, result, async_=asy):


In [87]:
# Create a chat memory instance using the ConversationSummaryMemory class
# 'llm' parameter specifies the language model to be used (in this case, an instance of ChatOpenAI)
# 'memory_key' parameter specifies the key for storing the message log in the memory

chat_memory = ConversationSummaryMemory(llm = ChatOpenAI(), memory_key = 'message_log')

In [107]:
#Defining a function "ask_a_query" to take the user's fashion related question and give the response

def ask_a_query(question):

# Create a chain of operations using various components to handle the conversation flow
# 'retriever' provides the context for the current conversation
# 'RunnablePassthrough' passes the question through without modification
# 'RunnablePassthrough.assign' assigns the message log by loading memory variables using RunnableLambda
# 'itemgetter' extracts the 'message_log' from the loaded memory variables
# The prompt template, chat model, and string output parser are then used in sequence

 chain1 = (
           {'context' : retriever,
           'question':RunnablePassthrough()}|
           RunnablePassthrough.assign(
            message_log = RunnableLambda(chat_memory.load_memory_variables)
            | 
            itemgetter('message_log')) 
        | prompt_template 
        | chat 
        | StrOutputParser()
    )
 """
Documentation:
The chain1 variable defines a sequence of operations to process a conversation between a human and an AI.
Components:
- retriever: Provides the context for the current conversation.
- RunnablePassthrough: Passes the question through without modification.
- RunnablePassthrough.assign: Assigns the message log by loading memory variables using RunnableLambda.
- RunnableLambda: A lambda function used to load memory variables from the chat memory.
- itemgetter: Extracts the 'message_log' from the loaded memory variables.
- prompt_template: A template for structuring the AI's response.
- chat: An instance of the ChatOpenAI class used to generate the AI's response.
- StrOutputParser: Parses the AI's response into a string format."""

# Invoke the chain of operations defined in 'chain1' with the given question
# 'question' parameter is the human's query to be processed through the chain
    
 response = chain1.invoke(question)

# Save the conversation context to the chat memory
# 'inputs' parameter specifies the input data (e.g., the question asked by the human)
# 'outputs' parameter specifies the output data (e.g., the response generated by the AI)
    
 chat_memory.save_context(inputs = {'input':question}, 
                             outputs = {'output':response})

 return print(response)

In [108]:
#Sample question
question = "Can you suggest green palazzos?"

In [109]:
#Calling the function to get a response
ask_a_query(question)

Yes, one option is the DORIYA Women Green Palazzos, priced at 1799. These are green woven palazzos with a solid color and opaque. The fit is straight with an elasticated waistband and it is made from a woven cotton fabric. It is recommended to machine-wash this item. The image can be found at this [link](http://assets.myntassets.com/assets/images/18622938/2022/6/4/d3e00a71-3012-45f3-a753-18fdc72a377b1654341599166DORIYAWomenGreenPalazzos1.jpg).


In [110]:
#Asking a follow up question
question = "What are the sizes available in it?"
ask_a_query(question)

I'm sorry but the details about the available sizes of DORIYA Women Green Palazzos are not available in the provided documents.


In [111]:
ask_a_query("recommend some party wear kurtas")

Based on the provided documents, I suggest the following party wear kurtas:

1. Khushal K Women Green Ethnic Motifs Printed Gotta Patti Kurta with Trousers & Dupatta priced at 5699. It is an Anarkali shaped, green, printed Kurta with trousers and a matching dupatta. The kurta fabric is a viscose rayon machine weave, adorned with Gotta Patti detail and featuring a round neck and three-quarter flared sleeves. The trousers have an elastic waistband and slip-on closure. The outfit is best suited for festive occasions and it is advisable to machine wash it.

2. Prakrti Pink & White Ethnic Motifs Printed Pure Cotton Pleated Kurti priced at 2199. This is a pink and white straight kurti with Ethnic motif print. The kurti has a mandarin collar, three-quarter sleeves, and is made of machine-weave pure cotton. The fabric care recommends a hand wash.

3. Biba Women Cream-Coloured & Blue Printed Straight Kurti priced at 1299. A cream, blue, and grey colored kurti with a round neck, a printed panel 