<a href="https://colab.research.google.com/github/DeependraChaddha/RAG_Projects/blob/main/RAG_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##ENVIRONMENT SETUP

In [None]:
#INSTALLING REQUIRED PACKAGES
!pip install langchain_community tiktoken langchain_openai langchainhub chromadb langchain youtube-transcript-api pytub

In [None]:
#SETTING LANGSMITH ENVIRONMENT
import os
os.environ["LANGCHAIN_TRACING_V2"]='true'
os.environ["LANGCHAIN_ENDPOINT"]='https://api.smith.langchain.com'
os.environ["LANGCHAIN_API_KEY"]=#YOUR_API_KEY


In [None]:
#OPENAI API KEY
os.environ["OPENAI_API_KEY"]=#YOUR_API_KEY

##ROUTING

LOGICAL ROUTING

Logical Routing inputs question into LLM to choose the correct database to use for embedding query

In [None]:
#MAKING REQUIRED IMPORTS
from typing import Literal
from langchain.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

#DataModel
class RouteQuery(BaseModel): #Routes user query to most suitable datasource
#datasource attribute/field can only take one of the 3 values
  datasource: Literal["python_docs", "js_docs", "golang_docs"] = Field(#Field function used to add metadata or customize fields of models
      ...,#This is to let PyDantic know that this field cannot be omitted
                                                                      description="Given a user question choose which datasource would be most relevant for answering their question",
    )#This class gives the format in which the llm will answer the query it is provided with in the RAG chain

#LLM with function call
llm=ChatOpenAI(model="gpt-3.5-turbo-0125",temperature=0)
structured_llm=llm.with_structured_output(RouteQuery) #This asks the llm to provide answer in the structure/format specified by class defined above(inherited from BaseModel class of Pydantic)

#Prompt
system="""You are an expert at routing a user question to the appropriate data source.

Based on the programming language the question is referring to, route it to the relevant data source."""

prompt=ChatPromptTemplate.from_messages([
    ("system",system),
    ("human","{question}"),
    ]
)

#Define Router
router=prompt|structured_llm #This chain takes the prompt and pipes it to the llm which gives an output structured according to RouteQuery

In [None]:
question="""Why doesn't the following code work:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(["human", "speak in {language}"])
prompt.invoke("french")
"""
#INVOKING THE ROUTER CHAIN
result= router.invoke({"question":question})

In [None]:
result

In [None]:
result.datasource

In [None]:
def choose_route(result):
  if "python_docs" in result.datasource.lower():
    return "chain for python_docs"
  elif "json_docs" in result.datasource.lower():
    return "chain for json_docs"
  else:
    return "golang_docs"

from langchain.runnables import RunnableLambda
#standardize function to be useful in Langchain pipeline architecture
full_chain=router|RunnableLamda(choose_route)

In [None]:
#INVOKE FULL CHAIN
full_chain.invoke({"question":question})

SEMANTIC ROUTING

Semantic Routing inputs and embeds the query and then selects suitable template to input in the LLM with the original question, the templates are made according to the problem to be solved.

In [None]:
#MAKE ALL REQUIRED IMPORTS
from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [None]:
#2 PROMPTS
physics_template="""You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}"""

math_template="""You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}"""

#Embed prompts

#Specify embedding
embeddings=OpenAIEmbeddings()

#Give list of templates
prompt_templates=[physics_template,math_template]

#Make embeddings of all the templates
prompt_embeddings=embeddings.embed_documents(prompt_templates)

In [None]:
#Find prompt suitable for the question
def prompt_router(input):
  #1. This function takes the query as input returns the prompt more suitable for the question
  #1.1. Embed Question
  query_embedding=embeddings.embed_query(input["query"])

  #1.2. Compute Similarity
  similarity=cosine_similarity([query_embeddings], prompt_embeddings)[0]
  most_similar=prompt_templates[similarity.argmax()]

  #1.3 Print which template is being used
  print("Using Math Template" if most_similar==math_template else "Using Physics Template")

  #1.4 Return prompt template
  return PromptTemplate.from_template(most_similar)

In [None]:
#Make entire chain
chain=(
    {"query":RunnablePassthrough()}
    |RunnableLambda(prompt_router)
    |ChatOpenAI()
    |StrOutputParser()
)

#Invoke chain
print(chain.invoke("What's a Black Hole?"))