# Lab 3.2 : Using LangChain with IBM WatsonX

## 1. Intro to LangChain

[LangChain](https://docs.langchain.com/docs/) is an open-source development framework designed to simplify the creation of applications using large language models (LLMs).

The core idea of the library is that we can "chain" together different components to create more advanced use cases around LLMs. Here are the main components for the LangChain

- Model: interact with various LLMs
- Prompts: text that is sent to the LLMs
- Chains: allow to combine different LLM calls and actions automatically
- Embeddings and Vector Stores: break large data into chunks and store those to be queried when relevant
- Agents: enbale the LLMs to dynamically decide which tools to use in order to best respond to a given query

In short, **Langchain is a framework that can orchestrate a series of prompts to achieve a desired outcomes.**


## 2. How to connect LangChain to WatsonX.ai

Foundation Model at Watsonx.AI

In [33]:
''' You can call the model using their path'''
mt_model = "bigscience/mt0-xxl"
llama2= "meta-llama/llama-2-70b-chat"

In [2]:
import os
from dotenv import load_dotenv
from typing import Any, List, Mapping, Optional, Union, Dict
from pydantic import BaseModel, Extra
try:
    from langchain import PromptTemplate
    from langchain.chains import LLMChain, SimpleSequentialChain
    from langchain.document_loaders import PyPDFLoader
    from langchain.indexes import VectorstoreIndexCreator #vectorize db index with chromadb
    from langchain.embeddings import HuggingFaceEmbeddings #for using HugginFace embedding models
    from langchain.text_splitter import CharacterTextSplitter #text splitter
    from langchain.llms.base import LLM
    from langchain.llms.utils import enforce_stop_tokens
except ImportError:
    raise ImportError("Could not import langchain: Please install ibm-generative-ai[langchain] extension.")

from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams
from ibm_watson_machine_learning.foundation_models.utils.enums import DecodingMethods

In [3]:
#config Watsonx.ai environment
load_dotenv()
api_key = os.getenv("API_KEY", None)
ibm_cloud_url = os.getenv("IBM_CLOUD_URL", None)
project_id = os.getenv("PROJECT_ID", None)
if api_key is None or ibm_cloud_url is None or project_id is None:
    print("Ensure you copied the .env file that you created earlier into the same directory as this notebook")
else:
    creds = {
        "url": ibm_cloud_url,
        "apikey": api_key 
    }

In [27]:
##initializing model's parameters

params = {
    GenParams.DECODING_METHOD: DecodingMethods.SAMPLE,
    GenParams.MAX_NEW_TOKENS: 100,
    GenParams.MIN_NEW_TOKENS: 2,
    GenParams.TEMPERATURE: 0.5,
    GenParams.TOP_K: 50,
    GenParams.TOP_P: 1
}

In order to use WatsonX-based LLMs with Langchain, the LLM object must be of class `BaseLanguageModel` (see [Langchain docs](https://api.python.langchain.com/en/latest/schema/langchain.schema.language_model.BaseLanguageModel.html)). We'll use the custom class below to accomplish this.

In [25]:
# Wrap the WatsonX Model in a langchain.llms.base.LLM subclass to allow LangChain to interact with the model

class LangChainInterface(LLM, BaseModel):
    credentials: Optional[Dict] = None
    model: Optional[str] = None
    params: Optional[Dict] = None
    project_id : Optional[str]=None

    class Config:
        """Configuration for this pydantic object."""
        extra = Extra.forbid

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        _params = self.params or {}
        return {
            **{"model": self.model},
            **{"params": _params},
        }
    
    @property
    def _llm_type(self) -> str:
        """Return type of llm."""
        return "IBM WATSONX"

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        """Call the WatsonX model"""
        params = self.params or {}
        model = Model(model_id=self.model, params=params, credentials=self.credentials, project_id=self.project_id)
        text = model.generate_text(prompt)
        if stop is not None:
            text = enforce_stop_tokens(text, stop)
        return text
    


In [41]:
##predict with the model
''' Pada Class LangChainInterface kita telah menambahkan fungsi wrapper untuk menjalankan model kita, sehingga kita cukup memanggil model_id yang telah kita 
definisikan sebelumnya ''' 

#Lalu kita dapat mencoba mengajukan pertanyaan sederhana dan lihat bagaimana model memberikan respon
model_list = [mt_model, llama2]
text = "Dimana ibukota dari Korea Selatan?"
for i in model_list: 
    llm_model = LangChainInterface(model=i, credentials=creds, params=params, project_id=project_id)
    print(f"\nModel {i} memberikan hasil:")
    print(llm_model(text))


Model bigscience/mt0-xxl memberikan hasil:
Seoul
Model meta-llama/llama-2-70b-chat memberikan hasil:

Where is the capital of South Korea?
The capital of South Korea is Seoul. Se


## 3. Prompt Templates & Chains

In the previous example, the user input is sent directly to the LLM. However, when using an LLM in an application, you will usually need to reuse the same prompt across multiple scenarios

- Accepting user input and contruct a prompt
- Generating mutiple prompts from an collection of data points in a dataset 

In [7]:
# Define the prompt templates
prompt = PromptTemplate(
  input_variables=["country"],
  template= "Apa ibukota dari negara {country}?",
)
llm_model = LangChainInterface(model=mt_model, credentials=creds, params=params, project_id=project_id)
# Chaining 
chain = LLMChain(llm=llm_model, prompt=prompt)

# Getting predictions
countries = ["USA", "Inggris", "Jepang", "Arab Saudi"]
for country in countries:
    response = chain.run(country)
    print(prompt.format(country=country) + " = " + response)

Apa ibukota dari negara USA? = Washington, D.C.
Apa ibukota dari negara Inggris? = London
Apa ibukota dari negara Jepang? = Tokyo
Apa ibukota dari negara Arab Saudi? = Riyadh


## 4. Simple sequential chains
The utility of LangChain becomes apparent as we chain outputs of one model as input to another model. Here's a simple example where one generates a question which the other model answers.

LangChain determines a model's output based on its response.  In our examples, the first model creates a response to the end prompt of "Question:" which LangChain maps as an input variable called "question" which it passes to the 2nd model.

In [48]:
## Create two sequential prompts 
pt1 = PromptTemplate(input_variables=["topik"], 
                    template="Buat suatu pertanyaan yang berkaitan dengan topik {topik}: Pertanyaan: ")
pt2 = PromptTemplate(input_variables=["pertanyaan"],
                     template="Jawab pertanyaan berikut : {pertanyaan}"
)

In [49]:
question_model = LangChainInterface(model='bigscience/mt0-xxl', credentials=creds, params=params, project_id=project_id)
answer_model=  LangChainInterface(model='bigscience/mt0-xxl', credentials=creds, project_id=project_id)

In [50]:
prompt_to_question= LLMChain(llm=question_model, prompt=pt1)
question_to_answer = LLMChain(llm=answer_model, prompt=pt2)
qa = SimpleSequentialChain(chains=[prompt_to_question, question_to_answer], verbose=True)

In [51]:
qa.run("Laptop")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mApa yang dimaksud dengan laptop?[0m
[33;1m[1;3mKomputer portabel[0m

[1m> Finished chain.[0m


'Komputer portabel'

## 5. Easy Loading of Documents Using Lang Chain
LangChain makes it easy to extract passages from documents so that you can answering questions based on your document's content.

In [3]:
pdf='pdfs/Machine_Learning.pdf'
loaders = [PyPDFLoader(pdf)]

In [13]:
index = VectorstoreIndexCreator(
    embedding=HuggingFaceEmbeddings(),
    text_splitter=CharacterTextSplitter(chunk_size=300
    , chunk_overlap=0)).from_loaders(loaders)

In [14]:
###initializing watsonx bigsciece/mt-model
params = {
    GenParams.DECODING_METHOD: "sample",
    GenParams.MIN_NEW_TOKENS: 50,
    GenParams.MAX_NEW_TOKENS: 300,
    GenParams.TEMPERATURE: 0.2,
    GenParams.TOP_K: 100,
    GenParams.TOP_P:1
}

model = LangChainInterface(model="bigscience/mt0-xxl", credentials=creds, params=params, project_id=project_id)

In [15]:
from langchain.chains import RetrievalQA
chain = RetrievalQA.from_chain_type(llm=model, 
                                    chain_type="refine", 
                                    retriever=index.vectorstore.as_retriever(), 
                                    input_key="question")

In [16]:
##answering based on the documents 
chain.run("Apa yang dimaksud dengan Machine Learning?")

'Machine learning dapat didefinisikan sebagai aplikasi komputer dan algoritma matematika yang diadopsi dengan cara pembelajaran yang berasal dari data dan menghasilkan prediksi di masa yang akan datang (Goldberg & Holland, 1988) . Adapun proses pembelajaran yang dimaksud adalah suatu usaha dalam memperoleh kecerdasan yang melalui dua tahap an tara lain latihan ( training ) dan pengujian (testing) (Huang, Zhu, & Siew, 2006) . Machine learning dapat didefinisikan sebagai aplikasi komputer dan algoritma matematika yang diadopsi dengan cara pembelajaran yang berasal dari data dan menghasilkan prediksi di masa yang akan datang (Goldberg & Holland, 1988) . Adapun proses pembelajaran yang dimaksud adalah suatu usaha dalam memperoleh kecerdasan yang melalui dua tahap an tara lain latihan ( training ) dan pengujian (testing) (Huang, Zhu, & Siew, 2006) . Machine learning dapat didefinisikan sebagai aplikasi komputer dan algoritma matematika yang diadopsi dengan cara pembelajaran yang berasal dar

Model dapat mengeluarkan hasil dengan melakukan pencarian terhadap dokumen yang diinginkan, namun untuk bisa meningkatkan kualitas dari output, silakan coba untuk melakukan parameter tuning ataupun mencoba model lain

## 6. Additional: Prompt Instruction in Solving Problem Using Bahasa

In [136]:
# Most of All we will playing with the prompt template
question_template = '''
<s>[INST] <<SYS>>
INSTRUCTION:
Kamu adalah seseorang yang memiliki ketertarikan dengan tempat wisata yang ada di dunia. 
Tanyakan suatu pertanyaan terkait tempat wisata yang diajukan pada bagian 'TOPIK'.
Pertanyaan tersebut harus menggunakan bahasa Indonesia yang sesuai dengan PUEBI
Pastikan pertanyaan Hanya satu kalimat. Gunakan Contoh dibawah ini:
Topik: Jepang
Pertanyaan: Gunung yang terkenal di Jepang?
<</SYS>>
INPUT:
TOPIK:{topik}
[/INST] 
'''


answer_template = '''
<s>[INST] <<SYS>>
Kamu adalah seorang expert tempat wisata yang ada di dunia yang sudah pernah mengunjungi berbagai tempat di dunia. Akan ada pertanyaan untukmu terkait dengan tempat wisata.
Hanya langsung berikan jawaban yang diharapkan.
Gunakan contoh di bawah ini:
Pertanyaan: Gunung yang terkenal di Jepang?
Jawaban: Fuji
Pertanyaan: Pegunungan tertinggi di Dunia?
Jawaban: Himalaya
<</SYS>>
Pertanyaan: {pertanyaan}
[/INST] 
'''

In [137]:
prompt_1 = PromptTemplate(
    input_variables=["topik"], 
    template=question_template
)

prompt_2 = PromptTemplate(
    input_variables=["pertanyaan"],
    template=answer_template,
)

In [146]:
parameters = {
    GenParams.DECODING_METHOD: "greedy",
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 20,
    GenParams.STOP_SEQUENCES: ['\n']
}

question_model = LangChainInterface(model=llama2, credentials=creds, params=params, project_id=project_id)
answer_model=  LangChainInterface(model=llama2, credentials=creds, project_id=project_id)

prompt_to_question= LLMChain(llm=question_model, prompt=prompt_1)
question_to_answer = LLMChain(llm=answer_model, prompt=prompt_2)
qa = SimpleSequentialChain(chains=[prompt_to_question, question_to_answer],verbose=True)

qa.run("Indonesia")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mPertanyaan: Tempat wisata alam yang terkenal di Indonesia?[0m
[33;1m[1;3mJawaban: Taman Nasional Gunung Gede Pangrango[0m

[1m> Finished chain.[0m


'Jawaban: Taman Nasional Gunung Gede Pangrango'

## Congratulations

Anda telah menyelesaikan Lab ini, terkadang ada beberapa prompt template yang harus anda coba sesuaikan agar mampu menghasilkan jawaban yang lebih baik lagi