#  End-to-end LLM Application using Langchain

- References
  - Theoritical concepts - Prof. [Mochen Yang](https://mochenyang.github.io/) MSBA 6461 - Advanced AI for Business (NLP and Reinforcement Learning)
  - Practical reference - Krish Naik's [End to End LLM App using Langchain](https://github.com/krishnaik06/Complete-Langchain-Tutorials/tree/main/LLM%20Generic%20APP).

<img src="LLMAppUsingLangchain.png" style="height:400px; width: 750px" />

Generated using Lucidchart.

#### Import required libraries

In [1]:
import openai
import langchain
from pinecone import Pinecone, ServerlessSpec
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
# from langchain.llms import OpenAI
from langchain_community.chat_models import ChatOpenAI
import pinecone

from langchain.chains.question_answering import load_qa_chain
from langchain import OpenAI

In [2]:
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
import os

#### Read the (pdf) document

In [4]:
def read_doc(directory):
    file_loader = PyPDFDirectoryLoader(directory)
    documents = file_loader.load()
    return documents

In [5]:
documents = read_doc('C:/Users/Atharva J/Desktop/UMN/FullTimeJobApplications/ResumeFeedbackChanges-Sridhar/')

In [6]:
latest_resume = documents[-1]
latest_resume

Document(metadata={'source': 'C:\\Users\\Atharva J\\Desktop\\UMN\\FullTimeJobApplications\\ResumeFeedbackChanges-Sridhar\\AtharvaJResume-14-DS-V4.pdf', 'page': 0}, page_content='ATHARVA  JOSHI  \n(763)-202-0720  ⚫ Email  ⚫ LinkedIn  ⚫ GitHub  ⚫ Medium    \n  \nSUMMARY  \nResults -driven data scientist with 3.5 years in retail, specializing in predictive modeling, machine learning, and strategic \ncollaboration. Proven success in delivering analytics solutions that fuel business growth and informed decision -making.  \n \nEXPERIENCE  \nCARLSON ANALYTICS LAB                  Minneapolis , MN \nAnalytics Lead                                    Jul 2023 - May 2024  \nImpact Analysis for Insiders Loyalty Program at a major Minneapolis mall. ( Python, SQL, PowerBI , JIRA ) \n● Steered  and coordinated analytics initiatives, working closely with cross -functional  teams to align loyalty program \ngoals with data -driven strategies.  \n● Led a team to segment 13,000+ insiders  in different clu

In [7]:
latest_resume.page_content

'ATHARVA  JOSHI  \n(763)-202-0720  ⚫ Email  ⚫ LinkedIn  ⚫ GitHub  ⚫ Medium    \n  \nSUMMARY  \nResults -driven data scientist with 3.5 years in retail, specializing in predictive modeling, machine learning, and strategic \ncollaboration. Proven success in delivering analytics solutions that fuel business growth and informed decision -making.  \n \nEXPERIENCE  \nCARLSON ANALYTICS LAB                  Minneapolis , MN \nAnalytics Lead                                    Jul 2023 - May 2024  \nImpact Analysis for Insiders Loyalty Program at a major Minneapolis mall. ( Python, SQL, PowerBI , JIRA ) \n● Steered  and coordinated analytics initiatives, working closely with cross -functional  teams to align loyalty program \ngoals with data -driven strategies.  \n● Led a team to segment 13,000+ insiders  in different clusters using K -Means clustering, improving customer \ntargeting by 30%.  \n● Boosted loyalty program engagement by 23% through tailored promotions  to insider  preferences  usin

In [8]:
len(documents)

9

#### Convert the document into chunks

In [9]:
# convert into chunks dur to the limitation of maximum token size for the model
def chunk_data(docs, chunk_size=800, chunk_overlap=50):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    doc = text_splitter.split_documents(docs)
    return doc

In [10]:
chunk_data(documents)

[Document(metadata={'source': 'C:\\Users\\Atharva J\\Desktop\\UMN\\FullTimeJobApplications\\ResumeFeedbackChanges-Sridhar\\AtharvaJoshiDS-V1-CA.pdf', 'page': 0}, page_content='ATHARVA JOSHI\nSan Francisco, CA|joshi461@umn.edu |(763)-202-0720|https://linkedin.com/in/attharvaj3147\nWork Experience\nCARLSON ANALYTICS LAB\nAnalytics LeadMinneapolis, MN\nJan 2024 - May 2024\n•Directed comprehensive analytics initiatives for a flagship shopping center in Minneapolis, collaborating with diverse teams \nto align loyalty program objectives with data-driven strategies.\n•Led a team to segment 13,000+ subscribers using K-Means clustering, improving customer targeting by 30%.\n•Boosted loyalty program engagement by 23% through targeted promotions based on store co-visitation patterns identified \nusing association rule mining.\nCARLSON ANALYTICS LAB\nData Science ConsultantMinneapolis, MN\nJul 2023 - Dec 2023'),
 Document(metadata={'source': 'C:\\Users\\Atharva J\\Desktop\\UMN\\FullTimeJobApplicat

#### Generating OpenAI embeddings for previously created chunks

In [11]:
embeddings = OpenAIEmbeddings(api_key=os.environ['OPENAI_API_KEY'])
embeddings

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x0000021E2592B530>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x0000021E2592B890>, model='text-embedding-ada-002', dimensions=None, deployment='text-embedding-ada-002', openai_api_version='', openai_api_base=None, openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [12]:
vectors = embeddings.embed_query("How are you?")
len(vectors)

1536

#### Create a Vector searchDB using Pinecone

In [13]:
# Initialize the Pinecone environment with your API key
os.environ['PINECONE_API_KEY'] = '170f5485-113f-4f69-99ab-ce60f0c9f5c7'

In [14]:
vectorstore_from_docs = PineconeVectorStore.from_documents(
        documents,
        index_name="llmlangchainproject1",
        embedding=embeddings
    )

In [15]:
# Initialize the OpenAI model with the correct API
openai_model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.4)

# Load the QA chain with the updated API
chain = load_qa_chain(openai_model, chain_type="stuff")

# Function to retrieve matching documents from VectorDB
def retrieve_query(query, k=2):
    matching_results = vectorstore_from_docs.similarity_search(query, k=k)
    return matching_results

# Function to retrieve answers using the QA chain
def retrieve_answers(query):
    doc_search = retrieve_query(query)
    # print(type(doc_search))
    
    response = chain.invoke({"input_documents": doc_search, "question": query})
    return response

  openai_model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.4)
stuff: https://python.langchain.com/v0.2/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/v0.2/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/v0.2/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/v0.2/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/v0.2/docs/how_to/#qa-with-rag
  chain = load_qa_chain(openai_model, chain_type="stuff")


In [16]:
# Example usage
question1 = "Is this person an expert in Machine Learning?"
answer1 = retrieve_answers(question1)
print(answer1['output_text'])

Yes, based on the detailed work experience and skills listed, Atharva Joshi appears to be an expert in machine learning. He has experience implementing various machine learning techniques, such as clustering, deep learning, predictive modeling, and time series forecasting, using tools like Python, TensorFlow, and AWS services. Additionally, his certifications, such as being an AWS Certified Machine Learning Specialty, further support his expertise in machine learning.


In [17]:
question2 = "Based on his resume, do you think Atharva will be able to work on Generative AI and Deep learning use cases?"
answer2 = retrieve_answers(question2)
print(answer2['output_text'])

Based on Atharva's resume, he has experience with deep learning models, such as LSTM models with attention mechanisms in TensorFlow. Additionally, he has implemented a deep learning model for employee review classification. These experiences suggest that Atharva has the skills and background to work on Generative AI and Deep Learning use cases.


In [18]:
question3 = "Can Atharva be a good professional full stack web developer?"
answer3 = retrieve_answers(question3)
print(answer3['output_text'])

Based on the information provided, Atharva Joshi has a strong background in data engineering, analytics, and related technologies. However, there is no specific mention of experience or skills related to full stack web development in the provided context. Therefore, it's unclear if Atharva would be a good professional full stack web developer.


In [19]:
question4 = "What do you think of this profile for a data scientist position? Rate the relevance of this profile out of 10 for \
            a data scientist position out of 10 based on general job description of a data science role."
answer4 = retrieve_answers(question4)
print(answer4['output_text'])

Based on the provided experience, skills, and education, I would rate this profile a 9 out of 10 for a data scientist position. The candidate has a strong background in predictive modeling, machine learning, and data analytics, with relevant experience in retail and various data science techniques. Additionally, the certifications, technical skills, and successful project outcomes demonstrate a high level of expertise in the field.
