## Langchain Chain - OpenHermes-2.5-Mistral-7B-GPTQ

1. GPTQ
- https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ

2. LANGCHAIN 手冊
https://python.langchain.com/docs/modules/chains/

3. 學習LANGCHAIN -> CHAIN

4. https://www.mlexpert.io/prompt-engineering/langchain-quickstart-with-llama-2

## 初始環境設定

In [None]:
import os
from pathlib import Path
HOME = str(Path.home())
Add_Binarry_Path=HOME+'/.local/bin'
os.environ['PATH']=os.environ['PATH']+':'+Add_Binarry_Path
current_foldr=!pwd
current_foldr=current_foldr[0]
current_foldr

## 確認CUDA版本, 以及否能使用GPU
若無gpu 請點選右側->已連線->變更執行階段類型->T4 Gpu

In [None]:
!nvidia-smi
import torch
torch.cuda.is_available()

## 安裝套件

In [None]:
!pip install chromadb cohere gdown kaleido langchain openai pyngrok pypdf python-dotenv sentence-transformers tiktoken -q
!pip install accelerate bitsandbytes hf_transfer huggingface_hub optimum transformers -q 
!pip install auto-gptq
#!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  -q # Use cu117 if on CUDA 11.7

### Download model

In [None]:
%%bash
# Download model
mkdir -p /content/OpenHermes-2.5-Mistral-7B-GPTQ
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/OpenHermes-2.5-Mistral-7B-GPTQ --local-dir /content/OpenHermes-2.5-Mistral-7B-GPTQ --local-dir-use-symlinks False

### Load Model
temperature 的參數值越小，模型就會回傳越確定的結果。如果調高該參數值，大語言模型可能會返回更隨機的結果，也就是說這可能會帶來更多樣化或更具創造性的產出


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.llms.huggingface_pipeline import HuggingFacePipeline

MODEL_ID = "/content/OpenHermes-2.5-Mistral-7B-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto")

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

llm = HuggingFacePipeline(pipeline=pipe, model_kwargs={"temperature": 0.0})

### 1. LLMChain (llm + prompt)
LLMChain is most basic chain in Langchai

In [None]:
# PromptTemplate
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

# PROMPT
template="""[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{question}[/INST]

"""

prompt = PromptTemplate(template=template, input_variables=["question"])

# 提問範例
question = "what are the 5 most populated cities in the world?"
print("\n\n提問範例")
print(prompt.format(question=question))

# Create Chain (prompt + model)
chain = LLMChain(llm=llm, prompt=prompt)

# RUN CHAIN
question = "what are the 5 most populated cities in the world?"
print("\n\n提問內容")
print(chain.run(question=question))

## 2. Sequential Chain  (llm + prompt)
A sequential chain works by combining two or more chains.

In [None]:
# Load library
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SequentialChain

# 1. TOPIC Template
template="""[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
Write a blog outline given a topic.
Topic: {topic}[/INST]

"""

prompt = PromptTemplate(template=template, input_variables=["topic"])

# TOPIC CHAIN
topic_chain = LLMChain(llm=llm, prompt=prompt, output_key="outline")


# 2. Outline Template
template="""[INST]Write a blog article based on the below outline.
Outline: 
{outline}[/INST]

"""

prompt = PromptTemplate(template=template, input_variables=["outline"])

# Outline CHAIN
outline_chain = LLMChain(llm=llm, prompt=prompt, output_key="article")


# Sequential Chain
overall_chain = SequentialChain(
    chains=[topic_chain, outline_chain],
    input_variables=["topic"],
    output_variables=["outline", "article"],
    verbose=True)

# Chain Run
result=overall_chain({"topic":"台南旅遊規劃"})

# Result
print("\n\n旅遊主題:")
print(result["topic"])
print("\n\n規劃大綱:")
print(result["outline"])
print("\n\n介紹文章:")
print(result["article"])


## 3. Retrieval QA chain  (model + prompt + documents + vectordb)

Retrieval QA chain is considered one of the most important helping with doing QA over your document data

In [None]:
!mkdir -p data/pdf/
!gdown 1AldhEWVCtcE50XARgSnXR0azZ965nNmT -O data/pdf/

In [None]:
# Load library
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# 文件解析
pdf_file='./data/pdf/e2729e76-29a0-4be5-9eef-67809b05d6b9.pdf'
loader= PyPDFLoader(pdf_file)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

# 向量資料庫
Embeddings_ID="sentence-transformers/all-MiniLM-L6-v2"
embeddings=HuggingFaceEmbeddings(model_name=Embeddings_ID)
vectortdb = Chroma.from_documents(texts, embeddings)
#DB_PATH = 'vectorstore/db_chroma'
#vectortdb = Chroma.from_documents(documents=texts, embedding=embeddings, persist_directory=DB_PATH)

# Load DB
#embeddings = OpenAIEmbeddings()
#DB_PATH = 'vectorstore/db_chroma'
#vectortdb = Chroma(persist_directory=DB_PATH, embedding_function=embeddings)

#: Test Search in Vector DB
query = "請說明櫃公司如何進行資產管理?"
source_documents=vectortdb.similarity_search(query, k=3)

for i, doc in enumerate(source_documents):
    page_content=source_documents[i].page_content
    page=source_documents[i].metadata["page"]
    source=source_documents[i].metadata["source"]
    file = os.path.basename(source) 
    print("Source: "+file+", Page "+str(page+1) )
    print(page_content)
    print("\n\n")

In [None]:
# Ptompt template
template = """
<s>[INST] <<SYS>>
Act as a cryptocurrency expert. Use the following information to answer the question at the end.
<</SYS>>
 
{context}
 
{question} [/INST]
"""
 
prompt = PromptTemplate(template=template, input_variables=["context", "question"])


#  RetrievalQA Chain 搜尋
chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectortdb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},    
    verbose=True
)

# Search
query = "請說明櫃公司如何進行資產管理?"
llm_response = chain(query)
print("\n\n問題:")
print(llm_response['query'].strip())
print("\n\n回答:")
print(llm_response['result'].strip())
print("\n\n參考來源:")
print(llm_response['source_documents'])

## 4. RetrievalQAWithSourcesChain  (model + prompt + documents + vectordb)
Retrieval QA chain is considered one of the most important helping with doing QA over your document data

In [None]:
# Load library
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.prompts import PromptTemplate

# 文件解析
pdf_file='./data/pdf/e2729e76-29a0-4be5-9eef-67809b05d6b9.pdf'
loader= PyPDFLoader(pdf_file)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
texts = text_splitter.split_documents(documents)

# 向量資料庫
Embeddings_ID="sentence-transformers/all-MiniLM-L6-v2"
embeddings=HuggingFaceEmbeddings(model_name=Embeddings_ID)
vectortdb = Chroma.from_documents(texts, embeddings)
#DB_PATH = 'vectorstore/db_chroma'
#vectortdb = Chroma.from_documents(documents=texts, embedding=embeddings, persist_directory=DB_PATH)

# Load DB
#embeddings = OpenAIEmbeddings()
#DB_PATH = 'vectorstore/db_chroma'
#vectortdb = Chroma(persist_directory=DB_PATH, embedding_function=embeddings)

#: Test Search in Vector DB
query = "請說明櫃公司如何進行資產管理?"
source_documents=vectortdb.similarity_search(query, k=3)

for i, doc in enumerate(source_documents):
    page_content=source_documents[i].page_content
    page=source_documents[i].metadata["page"]
    source=source_documents[i].metadata["source"]
    file = os.path.basename(source) 
    print("Source: "+file+", Page "+str(page+1) )
    print(page_content)
    print("\n\n")

In [None]:
#  RetrievalQAWithSourcesChain Chain 搜尋 + PROMPT
template = """
<s>[INST] <<SYS>>
Act as a cryptocurrency expert. Use the following information to answer the question at the end.
<</SYS>>
 
{summaries}
 
{question} [/INST]
"""    
    
prompt = PromptTemplate(template=template, input_variables=["summaries", "question"])


#  RetrievalQAWithSourcesChain 搜尋
chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectortdb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},    
    verbose=True
)


# Search
query = "請說明櫃公司如何進行資產管理?"
llm_response = chain(query)
print("\n\n問題:")
print(llm_response['question'])
print("\n\n回答:")
print(llm_response['answer'])
print("\n\n參考來源:")
print(llm_response['sources'])
print("\n\n參考來源:")
print(llm_response['source_documents'])


In [None]:
# RetrievalQAWithSourcesChain 頁碼解析
import os
source_documents=llm_response['source_documents'];

for i, doc in enumerate(source_documents):
    page_content=(llm_response['source_documents'][i].page_content)
    page=(llm_response['source_documents'][i].metadata["page"])
    source=llm_response['source_documents'][i].metadata["source"]
    file = os.path.basename(source) 
    print("SOURCE: "+file+", PAGE: "+str(page) )

## 5. Create Memory Chain

In [None]:
from langchain import ConversationChain, OpenAI, PromptTemplate, LLMChain
from langchain.memory import ConversationBufferWindowMemory

In [None]:
# Customize the LLM template
template = """Assistant is a large language model trained by OpenAI.

{history}
Human: {human_input}
Assistant:"""

prompt = PromptTemplate(input_variables=["history", "human_input"], template=template)

print(prompt.format(human_input="my_human_input", history="my_history" ))

In [None]:
# Create memory chain1
chain = LLMChain(llm=llm,prompt=prompt,memory=ConversationBufferWindowMemory(k=2))

# Predict a sentence using the chatgpt chain
output = chain.run(human_input="請依序列出聯邦學習的重點")

# Display the model's response
print(output)

In [None]:
# Create memory chain2
output = chain.run(human_input="請將以上的重點做一個結論")

# Display the model's response
print(output)

In [None]:
# Create memory chain3
output = chain.run(human_input="請將以上的總結, 規劃未來執行的方向")

# Display the model's response
print(output)