# Lab objective: Implementation of Domain-specific Conversational AI

### Installing Langchain and Related Dependencies for NLP Applications

In [2]:
!pip install langchain==0.1.11 langchain_community langchain_openai pinecone-client==3.1.0 langchain_pinecone pypdf


Collecting langchain_openai
  Downloading langchain_openai-0.1.6-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-core<0.2,>=0.1.29 (from langchain==0.1.11)
  Downloading langchain_core-0.1.52-py3-none-any.whl.metadata (5.9 kB)
Downloading langchain_openai-0.1.6-py3-none-any.whl (34 kB)
Downloading langchain_core-0.1.52-py3-none-any.whl (302 kB)
   ---------------------------------------- 0.0/302.9 kB ? eta -:--:--
   ---- ----------------------------------- 30.7/302.9 kB 1.3 MB/s eta 0:00:01
   ------------ --------------------------- 92.2/302.9 kB 1.3 MB/s eta 0:00:01
   ------------------ --------------------- 143.4/302.9 kB 1.1 MB/s eta 0:00:01
   --------------------------------- ------ 256.0/302.9 kB 1.3 MB/s eta 0:00:01
   ---------------------------------------  297.0/302.9 kB 1.3 MB/s eta 0:00:01
   ---------------------------------------- 302.9/302.9 kB 1.3 MB/s eta 0:00:00
Installing collected packages: langchain-core, langchain_openai
  Attempting uninstall: langchai

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pirate-speak 0.0.1 requires langchain-community<0.0.8,>=0.0.7, but you have langchain-community 0.0.29 which is incompatible.


### Setting Environment Variables for OpenAI and Pinecone API Access

This code configures the system environment to include access keys for OpenAI and Pinecone, necessary for authenticating API requests in programming tasks that interact with these services.

In [10]:
import os

os.environ["OPENAI_API_KEY"]="your_openai_api_key"

os.environ["PINECONE_API_KEY"]="your_pinecone_api_key"


### Initializing a Recursive Text Splitter with OpenAI Embeddings

This code initializes a RecursiveCharacterTextSplitter object from the LangChain library, configured to split text into 500-character chunks with a 100-character overlap.

It uses a hierarchy of separators (double newline, newline, period-space, and space) to determine the best places to divide the text. This setup is tailored for efficiently processing and managing large texts by breaking them into manageable parts.

In [4]:
# Import the RecursiveCharacterTextSplitter class.
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
# Initialize a RecursiveCharacterTextSplitter with specified separators and configurations.
# The text will be split into chunks around 1000 characters long, with a 20-character overlap between chunks.

character_splitter = RecursiveCharacterTextSplitter(
separators=["\n\n", "\n", ". ", " ", ""], # The hierarchy of separators to use for splitting.
chunk_size=500, # Target size of each chunk in characters.
chunk_overlap=100)


### Loading and Displaying PDF Documents with PyPDFLoader and Custom Splitting


This code leverages the PyPDFLoader from the LangChain community tools to load and process a specified PDF document. The content is split using a predefined text splitter, organizing the PDF's text into manageable segments based on character count and overlapping sections.

Additionally, a function pretty_print_docs is defined to neatly display each segment of the document, enhancing readability by inserting separators between segments for clear differentiation. This facilitates easier handling and review of the document's contents.

In [5]:
from langchain_community.document_loaders import PyPDFLoader

pdf="Supply Chain Management.pdf"

loader = PyPDFLoader(pdf)
docs=loader.load_and_split(text_splitter=character_splitter)

def pretty_print_docs(docs):
  print(
  f"\n{'-' * 100}\n".join(
  [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
  )
  )


In [11]:
os.environ["PINECONE_API_KEY"]

'2c413996-7d25-4f7d-9642-b21cd8b68d21'

### Configuring and Managing Pinecone Indexes for Vector Storage

**Pinecone Client Configuration**: The script sets up a Pinecone client using an API key and configures it for either serverless operations on AWS or a GCP environment based on the use_serverless flag.


**Index Management**: It checks for an existing index named 'demo24' and deletes it if present, then creates a new index configured for dot product metrics and tailored to the dimensionality of embeddings.


**Initialization Waiting:** The script waits until the newly created index is fully operational, checking its readiness and pausing execution as needed.


In [12]:
from pinecone import Pinecone, PodSpec,ServerlessSpec
import time
use_serverless=True

# configure client
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

if use_serverless:
  spec = ServerlessSpec(cloud='aws', region='us-east-1')
else:
# if not using a starter index, you should specify a pod_type too
  spec = PodSpec(environment='gcp-starter')
# check for and delete index if already exists
index_name = 'demo24'
if index_name in pc.list_indexes().names():
  pc.delete_index(index_name)
# create a new index
pc.create_index(
 index_name,
 dimension=1536, # dimensionality of OpenAI-embeddings
 metric='dotproduct',spec=spec)

# wait for index to be initialized
while not pc.describe_index(index_name).status['ready']:
 time.sleep(1)


### Implementing Document Similarity Search Using Pinecone and Langchain

**Vector Store instance Initialization**: Initializes a Pinecone vector store with documents and OpenAI embeddings

**Similarity Search Function**: Defines a function to perform similarity searches in the document index, with an option to return similarity scores.

**User Interaction**: Collects user queries, retrieves similar documents based on these queries, and displays them in a formatted manner.

In [15]:
from langchain_pinecone import PineconeVectorStore

docsearch = PineconeVectorStore.from_documents(docs, embedding=OpenAIEmbeddings(), index_name=index_name)

def get_similiar_docs(query, k=6, score=False):
  if score:
    similar_docs = docsearch.similarity_search_with_score(query, k=k)
  else:
    similar_docs = docsearch.similarity_search(query, k=k)
  return similar_docs

query=input("Query: ")
retrieved_docs=get_similiar_docs(query)
pretty_print_docs(retrieved_docs)


Query: What is supply chain management?
Document 1:

Defining Supply Chain
Management in the Journal of Business Logistics (Mentzer, John
T.,William DeWitt, James S. Keebler, Soonhong Min, Nancy
W.Nix, Carlo D. Smith, and Zach G. Zacharia, 2001,
“Defining Supply Chain Management,” Journal of Business
Logistics ,Vol. 22, No. 2, p. 18).
3Basic Concepts of Supply Chain ManagementHugos_ch1.1.qxd  11/5/02  11:30 AM  Page 3
----------------------------------------------------------------------------------------------------
Document 2:

Defining Supply Chain
Management in the Journal of Business Logistics (Mentzer, John
T.,William DeWitt, James S. Keebler, Soonhong Min, Nancy
W.Nix, Carlo D. Smith, and Zach G. Zacharia, 2001,
“Defining Supply Chain Management,” Journal of Business
Logistics ,Vol. 22, No. 2, p. 18).
3Basic Concepts of Supply Chain ManagementHugos_ch1.1.qxd  11/5/02  11:30 AM  Page 3
-----------------------------------------------------------------------------------------------

In [18]:
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI()

template = """
Answer the question based only on the following context: {context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

chain = create_stuff_documents_chain(llm, prompt, output_parser=StrOutputParser())

ans=chain.invoke({"question": query, "context": retrieved_docs})
print("Query: "+query)
print("Answer: "+ans)

Query: What is supply chain management?
Answer: Supply chain management is the coordination of production, inventory, location, and transportation among the participants in a supply chain to achieve the best mix of responsiveness and efficiency for the market being served. The goal is to increase sales of goods and services to the final customer while reducing inventory and operating expenses.
