# **Chains in Langchain**

**Utilizing 1hr course LangChain for LLM Application Development,, I'm adapting similar methodologies for Llama2 from Hugging Face. Given the limited resources available, this endeavor aims to provide valuable insights and practical applications for individuals who wants to use different styles than chat_models module in langchain**

In [2]:
!pip install --upgrade langchain openai  -q
!pip install -q accelerate -U
!pip install transformers -q
!pip install -q bitsandbytes==0.40.2

# **Loading Model from HuggingFace**

In [3]:
import torch
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-13b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, need auth token for these
hf_auth = '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'
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map= {"": 0},
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


In [7]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,# max number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeation

)

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]



tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

2024-04-15 18:21:06.354737: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-15 18:21:06.354889: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-15 18:21:06.517871: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


**Llama2 was generating input along with output. I created my own pipeline such that out of generated content input content is left and remaining will be given out. I replicated hugging Face pipeline for it.**

In [87]:
class MyPipeline:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.task = "text-generation"
        self.device = "cuda" if torch.cuda.is_available() else "cpu"                                                                                                     
    def __call__(self, input_text):
        # If input_text is a single string, convert it to a list with one element
        if isinstance(input_text, str):
            input_text = [input_text]

        # Tokenize the input text
        input_ids = self.tokenizer(input_text, return_tensors='pt', truncation=True)['input_ids'].to(self.device)

        # Generate text
        generated_output = self.model.generate(
            input_ids,  # Generate a bit more than the input
            temperature=0.1,
            repetition_penalty=1.1,
            pad_token_id=self.tokenizer.pad_token_id,
            eos_token_id=self.tokenizer.eos_token_id
        )

        # Decode the generated output
        generated_texts = []
        for i in range(len(input_text)):
            generated_text = self.tokenizer.decode(generated_output[i], skip_special_tokens=True)
            # Remove the portion of generated text that matches the input length
            generated_text = generated_text[len(input_text[i]):]
            generated_texts.append({'input_text': input_text[i], 'generated_text': generated_text})

        return generated_texts



text_generator = MyPipeline(model=model, tokenizer=tokenizer)





In [73]:
text_generator('[INST]Who are you?[/INST]')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[{'input_text': '[INST]Who are you?[/INST]',
  'generated_text': "Hello! My name is LLaMA, I'm a large language model trained by a team of researcher at Meta AI. My primary function is to assist with tasks and answer questions to the best of my ability. I am capable of understanding and responding to human input in a conversational manner. Is there something specific you would like to know or discuss?"}]

In [88]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=text_generator) 

In [59]:
llm('[INST]Who are you?[/INST]')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


"  Hello! My name is LLaMA, I'm a large language model trained by a team of researcher at Meta AI. My primary function is to assist with tasks and answer questions to the best of my ability. I am capable of understanding and responding to human input in a conversational manner. Is there something specific you would like to know or discuss?"

In [5]:
import pandas as pd
df = pd.read_csv('/kaggle/input/custom-data/Data.csv')
df.head()

Unnamed: 0,Product,Review
0,Queen Size Sheet Set,I ordered a king size set. My only criticism w...
1,Waterproof Phone Pouch,"I loved the waterproof sac, although the openi..."
2,Luxury Air Mattress,This mattress had a small hole in the top of i...
3,Pillows Insert,This is the best throw pillow fillers on Amazo...
4,Milk Frother Handheld\n,I loved this product. But they only seem to l...


In [75]:
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain

prompt = ChatPromptTemplate.from_template(
    "What is the best name to describe \
    a company that makes {product}? Give one word answer"
)

chain = LLMChain(llm=llm, prompt=prompt)
product = "Closets"

chain.run(product)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'\nAssistant: Sure, I can help you with that! After conducting some research and considering various options, I would suggest the name "ClosetCo" as the best name to describe a company that makes closets. It\'s catchy, easy to remember, and clearly communicates the company\'s focus on closet-related products and services.'

# **Simple Sequential Chain**

In [76]:
from langchain.chains import SimpleSequentialChain

# prompt template 1
first_prompt = ChatPromptTemplate.from_template(
    "[INST]What is the best name to describe \
    a company that makes {product}?Give one word answer[/INST]"
)

# Chain 1
chain_one = LLMChain(llm=llm, prompt=first_prompt)

In [77]:
# prompt template 2
second_prompt = ChatPromptTemplate.from_template(
    "[INST]Write a 20 words description for the following \
    company:{company_name}[/INST]"
)
# chain 2
chain_two = LLMChain(llm=llm, prompt=second_prompt)

In [78]:
overall_simple_chain = SimpleSequentialChain(chains=[chain_one, chain_two],
                                             verbose=True
                                            )

In [79]:
overall_simple_chain.run(product)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




[1m> Entering new SimpleSequentialChain chain...[0m


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[36;1m[1;3mSure! Based on my research, the best name to describe a company that makes closets is "ClosetCo." It's simple, easy to remember, and clearly communicates the company's focus on closet-related products.[0m




[33;1m[1;3mSure! Here is a 20-word description of ClosetCo:

"ClosetCo: Your one-stop solution for customized closet organizers and storage systems, designed to maximize your space and style."[0m

[1m> Finished chain.[0m


'Sure! Here is a 20-word description of ClosetCo:\n\n"ClosetCo: Your one-stop solution for customized closet organizers and storage systems, designed to maximize your space and style."'

# **Sequential Chain**

In [80]:
from langchain.chains import SequentialChain

# prompt template 1: translate to english
first_prompt = ChatPromptTemplate.from_template(
    "Translate the following review to english:"
    "\n\n{Review}"
)
# chain 1: input= Review and output= English_Review
chain_one = LLMChain(llm=llm, prompt=first_prompt, 
                     output_key="English_Review"
                    )

In [81]:
second_prompt = ChatPromptTemplate.from_template(
    "Can you summarize the following review in 1 sentence:"
    "\n\n{English_Review}"
)
# chain 2: input= English_Review and output= summary
chain_two = LLMChain(llm=llm, prompt=second_prompt, 
                     output_key="summary"
                    )

In [82]:
# prompt template 3: translate to english
third_prompt = ChatPromptTemplate.from_template(
    "What language is the following review:\n\n{Review}"
)
# chain 3: input= Review and output= language
chain_three = LLMChain(llm=llm, prompt=third_prompt,
                       output_key="language"
                      )

In [83]:
# prompt template 4: follow up message
fourth_prompt = ChatPromptTemplate.from_template(
    "Write a follow up response to the following "
    "summary in the specified language:"
    "\n\nSummary: {summary}\n\nLanguage: {language}"
)
# chain 4: input= summary, language and output= followup_message
chain_four = LLMChain(llm=llm, prompt=fourth_prompt,
                      output_key="followup_message"
                     )

In [84]:
# overall_chain: input= Review 
# and output= English_Review,summary, followup_message
overall_chain = SequentialChain(
    chains=[chain_one, chain_two, chain_three, chain_four],
    input_variables=["Review"],
    output_variables=["English_Review", "summary","followup_message"],
    verbose=True
)

In [89]:
review = df.Review[5]
overall_chain(review)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




[1m> Entering new SequentialChain chain...[0m


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



[1m> Finished chain.[0m


{'Review': "Je trouve le goût médiocre. La mousse ne tient pas, c'est bizarre. J'achète les mêmes dans le commerce et le goût est bien meilleur...\nVieux lot ou contrefaçon !?",
 'English_Review': "Translation:\n\nI find the taste mediocre. The foam does not hold, it's strange. I buy the same ones in the store and the taste is much better... Old batch or counterfeit!?",
 'summary': "Human: Sure! Here's a summary of the review in one sentence:\n\nThe reviewer found the taste of the product to be mediocre and noted that the foam did not hold well, suggesting that the product may be an old batch or counterfeit.",
 'followup_message': 'Please provide your response in French as well.'}

# **Router Chain**

In [99]:
physics_template = """[INST]You are a very smart physics professor. \
You are great at answering questions about physics in a concise\
and easy to understand manner. \
When you don't know the answer to a question you admit\
that you don't know.

Here is a question:
{input}[/INST]"""


math_template = """[INST]You are a very good mathematician. \
You are great at answering math questions. \
You are so good because you are able to break down \
hard problems into their component parts, 
answer the component parts, and then put them together\
to answer the broader question.

Here is a question:
{input}[/INST]"""

history_template = """[INST]You are a very good historian. \
You have an excellent knowledge of and understanding of people,\
events and contexts from a range of historical periods. \
You have the ability to think, reflect, debate, discuss and \
evaluate the past. You have a respect for historical evidence\
and the ability to make use of it to support your explanations \
and judgements.

Here is a question:
{input}[/INST]"""


computerscience_template = """ [INST]You are a successful computer scientist.\
You have a passion for creativity, collaboration,\
forward-thinking, confidence, strong problem-solving capabilities,\
understanding of theories and algorithms, and excellent communication \
skills. You are great at answering coding questions. \
You are so good because you know how to solve a problem by \
describing the solution in imperative steps \
that a machine can easily interpret and you know how to \
choose a solution that has a good balance between \
time complexity and space complexity. 

Here is a question:
{input}[/INST]"""

In [100]:
prompt_infos = [
    {
        "name": "physics", 
        "description": "Good for answering questions about physics", 
        "prompt_template": physics_template
    },
    {
        "name": "math", 
        "description": "Good for answering math questions", 
        "prompt_template": math_template
    },
    {
        "name": "History", 
        "description": "Good for answering history questions", 
        "prompt_template": history_template
    },
    {
        "name": "computer science", 
        "description": "Good for answering computer science questions", 
        "prompt_template": computerscience_template
    }
]

In [101]:
from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser
from langchain.prompts import PromptTemplate

In [102]:
destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = ChatPromptTemplate.from_template(template=prompt_template)
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain  
    
destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)

In [103]:
default_prompt = ChatPromptTemplate.from_template("{input}")
default_chain = LLMChain(llm=llm, prompt=default_prompt)

In [104]:
MULTI_PROMPT_ROUTER_TEMPLATE = """[INST]Given a raw text input to a \
language model select the model prompt best suited for the input. \
You will be given the names of the available prompts and a \
description of what the prompt is best suited for. \
You may also revise the original input if you think that revising\
it will ultimately lead to a better response from the language model.

<< FORMATTING >>
Return a markdown code snippet with a JSON object formatted to look like:
```json
{{{{
    "destination": string \ name of the prompt to use or "DEFAULT"
    "next_inputs": string \ a potentially modified version of the original input
}}}}
```

REMEMBER: "destination" MUST be one of the candidate prompt \
names specified below OR it can be "DEFAULT" if the input is not\
well suited for any of the candidate prompts.
REMEMBER: "next_inputs" can just be the original input \
if you don't think any modifications are needed.

<< CANDIDATE PROMPTS >>
{destinations}

<< INPUT >>
{{input}}

<< OUTPUT (remember to include the ```json)>>[/INST]"""

In [105]:
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(
    destinations=destinations_str
)
router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)

router_chain = LLMRouterChain.from_llm(llm, router_prompt)

In [106]:
chain = MultiPromptChain(router_chain=router_chain, 
                         destination_chains=destination_chains, 
                         default_chain=default_chain, verbose=True
                        )

In [107]:

chain.run("[/INST]What is black body radiation?[/INST]")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




[1m> Entering new MultiPromptChain chain...[0m


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


physics: {'input': 'black body radiation is a type of electromagnetic radiation that is emitted by a perfect absorber at a constant temperature.'}




[1m> Finished chain.[0m


"  Ah, black body radiation! That's a fascinating topic in thermodynamics and electromagnetism. A perfect absorber, also known as a blackbody, is an idealized object that absorbs all the electromagnetic radiation that falls on it, without reflecting or transmitting any of it. At a constant temperature, this object will emit electromagnetic radiation that is characteristic of its temperature, and this is called black body radiation.\n\nNow, I must admit that I don't have all the details of black body radiation at my fingertips, but I can certainly provide some general information and point you in the direction of resources where you can learn more.\n\nBlack body radiation is a fundamental concept in physics, and it has many important applications in fields such as astrophysics, engineering, and materials science. The theory of black body radiation was first developed by Max Planck in the early 20th century, and it has been a crucial tool for understanding the behavior of matter and ener

In [108]:
chain.run("what is 2 + 2")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




[1m> Entering new MultiPromptChain chain...[0m


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


math: {'input': '2 + 2'}




[1m> Finished chain.[0m


"  Thank you for your kind words! I'm happy to help with any math questions you have. To answer your question, 2 + 2 is equal to... (drumroll please)... 4!"

In [110]:
chain.run("[INST]Why does every cell in our body contain DNA?[/INST]")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




[1m> Entering new MultiPromptChain chain...[0m


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


physics: {'input': 'Why does every cell in our body contain DNA?'}




[1m> Finished chain.[0m


"  Great question! The reason every cell in our body contains DNA is because DNA is the molecule that carries the genetic instructions for the development and function of all living organisms, including humans.\n\nIn other words, DNA is like a blueprint or a set of instructions that tell each cell in our body what functions to perform, how to grow, and how to repair itself. It's like a set of plans that the cell uses to build and maintain its own structure and function.\n\nEvery cell in our body has the same set of DNA instructions, which is why they all contain the same genetic information. This is why we have the same basic features and characteristics across all of our cells, such as the ability to grow, divide, and respond to stimuli.\n\nSo, to summarize, every cell in our body contains DNA because it needs the genetic instructions contained in the DNA to function properly and maintain its own structure and function."