### Models (LLM and ChatModel)
* How to use LLMs for real world problems, for which Langchain provide a framework to implement different usecases.
* Langchain provides two types of models

#### LLMs: 
Such models are similar to what general LLM models which can be used for any purpose. The prompt is a string/string-template as we interact with ChatGPT

In [1]:
# Set your API key : https://platform.openai.com/account/api-keys
# store the API key in Environment variable : https://networkdirection.net/python/resources/env-variable/
import os
API_KEY = os.environ.get('OpenAI_API_Key')

In [2]:
from langchain.llms import OpenAI
llm = OpenAI(openai_api_key=API_KEY)

In [2]:
llm("Explain how AI wil change the world")

'\n\nAI is projected to have a monumental impact on the world. It is likely to change the way we live, work, and interact with each other. AI is being used to automate mundane tasks, create personalized services, and create new ways to interact with our environment. AI is providing ways to make decisions more quickly and accurately, and to automate complex processes. AI is being used to automate medical diagnostics, improve agricultural yields, and even to develop new medicines. AI is also being used to improve customer service, automate financial transactions, and even to control self-driving cars. AI is also being used to analyze large amounts of data to identify patterns, predict outcomes, and make better decisions. As AI continues to develop, it will be used to create even more innovative solutions and services that will revolutionize our lives.'

#### ChatModels: 
Using LLMs under the hood, these models are specific to chatbots and has three major components

* HumanMessage : The prompt given by the user

* AIMessage : The response given by the LLM

* SystemMessage: Its a context that we can pass to the chatmodel about its role.

In [2]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage,SystemMessage

In [6]:
messages = [SystemMessage(
content="You are a Grammar Teacher who responds Yes for correct Grammar input"),
HumanMessage(content="I love, programming.")]

chat = ChatOpenAI(openai_api_key=API_KEY)
chat(messages)

AIMessage(content='Yes.', additional_kwargs={}, example=False)

#### Prompt Templates

How to make this prompt more customizable ? There are predefined templates for prompts for an LLM so that we don’t need to write any prompt from scratch. But these prompts have different types

* PromptTemplate : This is used to create a prompt from string input. This can be considered as the baseline version of templates where we can pass a variable or simply write a whole prompt as required. 

In [19]:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    "Write {lines} lines about {topic}"
)
prompt_template.format(lines="5", topic="Australia")

'Write 5 lines about Australia'

In [20]:
# Or
prompt_template = PromptTemplate.from_template(
    "Write 5 lines about Australia"
)
prompt_template.format()

'Write 5 lines about Australia'

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

# app : to check grammar for an English sentence
template = """You are a English Teacher. The user will provide a English sentence as input.
Output Correct if it is grammatically right else output Wrong and rewrite the sentence in the next line"""

system_message_prompt = SystemMessagePromptTemplate.from_template(template) # what this app is all about
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])


In [18]:
# chain object:
chain = LLMChain(
    llm=ChatOpenAI(openai_api_key=API_KEY, max_retries=3),
    prompt=chat_prompt
)


In [5]:
print('Prompt 1',chain.run("Get out house my you !"))
print('\nPrompt 2',chain.run("You're beautiful"))

Prompt 1 Wrong.
You're saying "Get out house my you!" but the correct sentence should be "Get out of my house!"

Prompt 2 Correct


In [7]:
# wait for 60s if you see RateLimitError
print('\nPrompt 3',chain.run("I love my parents"))
print('\nPrompt 4',chain.run("He have done this task"))


Prompt 3 Correct

Prompt 4 Wrong
He has done this task.


In [None]:
# App : Changing the tone of the input sentence

In [12]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

#the imports remain the same
template = "You are a helpful assistant that rewrite the input text according to the {tone} passed."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

chain = LLMChain(
    llm=ChatOpenAI(openai_api_key=API_KEY, max_retries=3),
    prompt=chat_prompt
)


In [9]:
# two variables : text and tone
print(chain.run({'text':'I am a boy','tone':'angry'}))
print(chain.run({'text':'He is cute','tone':'sweet'}))

I am a boy.
He is adorable.


In [13]:
# wait for 60s if you see RateLimitError
print(chain.run({'text':'His friend died','tone':'regretful'}))
print(chain.run({'text':'I kicked his friend Adam','tone':'regretful'}))

I'm sorry to hear that his friend passed away.
I deeply regret kicking his friend Adam.


In [None]:
# App : for language translation

In [21]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
#imports remain the same
from langchain.schema import BaseOutputParser
class CommaSeparatedListOutputParser(BaseOutputParser):
    """Parse the output of an LLM call to a comma-separated list."""
    def parse(self, text: str):
        """Parse the output of an LLM call."""
        return text.strip().split(", ")

template = "You are a helpful assistant that translates input text to {output_language}. Separate the outputs for each language by comma."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

chain = LLMChain(
    llm=ChatOpenAI(openai_api_key=API_KEY, max_retries=3), #request_timeout=200
    prompt=chat_prompt,
    output_parser=CommaSeparatedListOutputParser()
)

In [16]:
print(chain.run({'text':'I am a boy','input_language':'english','output_language':'hindi,italian'}))
print(chain.run({'text':'I am a boy','input_language':'english','output_language':'French,German'}))

['मैं एक लड़का हूँ', 'Io sono un ragazzo.']
['Je suis un garçon', 'Ich bin ein Junge']


In [17]:
print(chain.run({'text':'I am a boy','input_language':'english','output_language':'Arabic'}))
print(chain.run({'text':'I am a boy','input_language':'english','output_language':'Sanskrit, bhojpuri'}))

['أنا صبي']
['अहम् बालकः', 'हम छौरा बा']


#### ChatPromptTemplate

ChatPromptTemplate are for ChatModels for customizing prompts for ChatModels, similar to what PromptTemplate are for LLM models.

In [23]:
from langchain.prompts import ChatPromptTemplate
from langchain.prompts.chat import SystemMessage, HumanMessagePromptTemplate
from langchain.chat_models import ChatOpenAI

template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=("You are a python coder AI that helps user with bugs and writing programs")),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)


llm = ChatOpenAI(openai_api_key=API_KEY, max_retries=3)
llm(template.format_messages(text='Check whether a number is prime or not'))

AIMessage(content='Sure! I can help you with that. Here\'s a Python function that checks whether a number is prime or not:\n\n```python\ndef is_prime(n):\n    if n <= 1:\n        return False\n\n    for i in range(2, int(n**0.5) + 1):\n        if n % i == 0:\n            return False\n\n    return True\n```\n\nYou can now call this function and pass in the number you want to check. It will return `True` if the number is prime, and `False` otherwise.\n\nFor example:\n\n```python\nnum = 17\nif is_prime(num):\n    print(f"{num} is prime")\nelse:\n    print(f"{num} is not prime")\n```\n\nOutput:\n```\n17 is prime\n```\n\nLet me know if you need any further assistance!', additional_kwargs={}, example=False)

* As we had 3 roles in ChatModels, we have 3 roles in this ChatPromptTemplate also: system, ai & human.

* Similar to PromptTemplate, we can pass variables in the template to customize the prompt according to different usecases. 

#### Chain

The above example were side features of Langchain. The most crucial feature is Chains. So what actually are chains?

They can be taken as programs written around LLMs that can perform specific tasks. Internally in backend, Chains are nothing but chaining multiple LLMs together or with other elements like some 3rd Party tool integrations. 

#### Memory

When we interacted with ChatGPT, we have noticed it has a memory and we can ask questions based on previous questions or answers in the conversation. How to add this memory to an LLM?
* Add a memory to the agent using LangChain so that the bot rememebers the memory

In [35]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory

* Memory is getting added similar to we pass a variable to templates (as we saw in above examples). The format/template is important here.
* Create a ConversationalBufferMemory() object and passing this variable under 'Previous conversations' to this function
* This buffer memory object is than assigned to LLMChain() object enabling storage of historic information.

In [39]:
template = """You are a chatbot having a conversation with a human.

Previous conversation:
{chat_history}

New human question: {human_input}
Chatbot:"""

In [None]:
# chat_history : variable for memory

In [40]:
prompt = PromptTemplate(input_variables=["chat_history", "human_input"], template = template)

In [42]:
memory = ConversationBufferMemory(memory_key="chat_history")

In [None]:
llm = OpenAI(openai_api_key=API_KEY, temperature=0)

In [43]:
llm_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=True,
    memory=memory
)

In [44]:
llm_chain.predict(human_input = "Hi there my friend")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human.

Previous conversation:


New human question: Hi there my friend
Chatbot:[0m

[1m> Finished chain.[0m


' Hi there! How can I help you?'

In [45]:
llm_chain.predict(human_input = "Tell me what is Machine Learning")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human.

Previous conversation:
Human: Hi there my friend
AI:  Hi there! How can I help you?

New human question: Tell me what is Machine Learning
Chatbot:[0m

[1m> Finished chain.[0m


' Machine Learning is a type of artificial intelligence that allows computers to learn from data without being explicitly programmed. It uses algorithms to identify patterns in data and make predictions or decisions based on those patterns.'

In [46]:
llm_chain.predict(human_input = "How is it different from AI?")



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human.

Previous conversation:
Human: Hi there my friend
AI:  Hi there! How can I help you?
Human: Tell me what is Machine Learning
AI:  Machine Learning is a type of artificial intelligence that allows computers to learn from data without being explicitly programmed. It uses algorithms to identify patterns in data and make predictions or decisions based on those patterns.

New human question: How is it different from AI?
Chatbot:[0m

[1m> Finished chain.[0m


' Machine Learning is a subset of Artificial Intelligence (AI). AI is a broader concept that includes Machine Learning, but also includes other techniques such as natural language processing, computer vision, and robotics. Machine Learning focuses on using data to train algorithms to make predictions or decisions, while AI focuses on creating intelligent systems that can think and act like humans.'

In [47]:
llm_chain.predict(human_input = "How are the two different from Data Science ?")
# it has to remember the previous two questions context



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human.

Previous conversation:
Human: Hi there my friend
AI:  Hi there! How can I help you?
Human: Tell me what is Machine Learning
AI:  Machine Learning is a type of artificial intelligence that allows computers to learn from data without being explicitly programmed. It uses algorithms to identify patterns in data and make predictions or decisions based on those patterns.
Human: How is it different from AI?
AI:  Machine Learning is a subset of Artificial Intelligence (AI). AI is a broader concept that includes Machine Learning, but also includes other techniques such as natural language processing, computer vision, and robotics. Machine Learning focuses on using data to train algorithms to make predictions or decisions, while AI focuses on creating intelligent systems that can think and act like humans.

New human question: How are the two different from Data Science

' Data Science is a field of study that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data. Machine Learning and Artificial Intelligence are two techniques used in Data Science to analyze data and make predictions or decisions. Machine Learning focuses on using data to train algorithms to make predictions or decisions, while AI focuses on creating intelligent systems that can think and act like humans.'

#### Retrieval

Many a times, we would wish to provide context to the LLM which is open-source or private. For example, if working for some organization XYZ, we might wish to do a Q&A on some report. How to provide the context to LLM about this report? Obviously by uploading it to someplace where LLM can access it. Retrieval elements helps in usage of these external context while generation of a response by an LLM called as the Retrieval Augmented Generation (RAG). Now this external resource can be a Database, CSV, text file, PDF or Youtube video. This Retrieval system has the below major components

* Document loader: This component helps in loading the external resource in memory

* Document Transformer: This is majorly a preprocessing steps once the external document is loaded in the memory (like tokenizing the text, character splitting, etc)

* Embedding model: To store these documents, we 1st need to generate embeddings for the text present.

* Vector Databases/Store: Databases specialized in storing vector data.

* Retrievers: Retrieve relevant entries from the vector DB once a prompt is run using LLM

#### Youtube video  analysis Question Asnwering with LangChain

In [52]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader, YoutubeLoader, GoogleApiYoutubeLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma


In [None]:
# https://www.youtube.com/watch?v=3w0EhxiebUA : Named Entity Recognition detection Youtube video

In [54]:
loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=3w0EhxiebUA", add_video_info =True,
language=["en", "id"],
translation="en",
)
data = loader. load()
# add video info : use Metadata

In [55]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
# Preprocessign the text from the video

In [56]:
embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)
llm = OpenAI(openai_api_key=API_KEY)

In [57]:
docsearch = Chroma.from_documents(texts, embeddings)

##### Question Answering on the text of the Youtube video

In [61]:
qa = RetrievalQA.from_chain_type(llm = llm, chain_type= "stuff", retriever= docsearch.as_retriever())

In [62]:
print('\n'.join(qa.run('Summarize the text in short points').split('\n')))

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


 -CRFs are used for named entity recognition 
-Information extraction refers to extracting important information from a given text 
-Named entity recognition is part of information extraction
-Segmentation ambiguity and tag assignment ambiguity are two major problems that can be faced 
-Feature functions are used to generate word-level features 
-CRF equation is used to calculate the probability of a given tag sequence 
-Weights are assigned to different feature functions 
-CRFs are trained using training data and weight updation is done using gradient descent


In [63]:
print('\n'.join(qa.run('Generate some quiz questions from this text').split('\n')))

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1




1. What is the equation used in CRF?
2. What is a feature function?
3. What does the letter "O" refer to in IOB tagging?
4. What is the denominator of the CRF equation calculated from?
5. How are weights updated in CRF?


In [None]:
# Try this with  another Youtube video : https://www.youtube.com/watch?v=Gn_PjruUtrc

#### CSV file analysis using Langchain

In [67]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

from langchain.agents.agent_types import AgentType
from langchain.agents import create_csv_agent


In [69]:
# Creating a csv agent
agent = create_csv_agent(
OpenAI(temperature=0,openai_api_key=API_KEY),"titanic.csv",verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)


In [70]:
agent.run('List down the most expensive cabins')



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: I need to find the cabins with the highest fare
Action: python_repl_ast
Action Input: df.sort_values(by='fare', ascending=False)['cabin'].head()[0m
Observation: [36;1m[1;3m183           B101
302            NaN
49     B51 B53 B55
50     B51 B53 B55
113    C23 C25 C27
Name: cabin, dtype: object[0m
Thought:[32;1m[1;3m I now know the most expensive cabins
Final Answer: B101, NaN, B51 B53 B55, C23 C25 C27[0m

[1m> Finished chain.[0m


'B101, NaN, B51 B53 B55, C23 C25 C27'

In [71]:
agent.run('Give some interesting facts from the data')



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: I should look at the data and see what stands out
Action: python_repl_ast
Action Input: df.describe()[0m
Observation: [36;1m[1;3m            pclass     survived          age        sibsp        parch  \
count  1309.000000  1309.000000  1046.000000  1309.000000  1309.000000   
mean      2.294882     0.381971    29.881138     0.498854     0.385027   
std       0.837836     0.486055    14.413493     1.041658     0.865560   
min       1.000000     0.000000     0.170000     0.000000     0.000000   
25%       2.000000     0.000000    21.000000     0.000000     0.000000   
50%       3.000000     0.000000    28.000000     0.000000     0.000000   
75%       3.000000     1.000000    39.000000     1.000000     0.000000   
max       3.000000     1.000000    80.000000     8.000000     9.000000   

              fare        body  
count  1308.000000  121.000000  
mean     33.295479  160.809917  
std      51.758668   97.696922  
min       0.

'The average age of passengers is 29.88, the average fare is 33.29, and the average body count is 160.81.'

#### PDF file analysis using Langchain

In [None]:
#lpip install chromadb langchain openai tiktoken unstructured pypdfium2

In [73]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI

from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import PyPDFium2Loader

In [74]:
# Read the Attention is all you need research paper
loader = PyPDFium2Loader("NIPS-2017-attention-is-all-you-need-Paper.pdf")
data = loader.load()

In [76]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

In [77]:
embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)
llm = OpenAI(openai_api_key=API_KEY)

In [78]:
# Storing the embeddings - vectordb
docsearch = Chroma.from_documents(texts, embeddings)

In [79]:
# Retrieval QA chain
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever = docsearch.as_retriever())

In [80]:
qa.run('what is this pdf about?')

' This pdf is about the Transformer, a new simple network architecture based solely on attention mechanisms which can be used for sequence transduction tasks such as language modeling and machine translation.'

In [81]:
qa.run('who is the author of this paper ?')

' The authors of this paper are Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin.'

#### End